text-generation-inference
deec30f8 - hotfix: avoid non-prefilled block use when using prefix caching (#2489)

Commit
1 year ago
hotfix: avoid non-prefilled block use when using prefix caching (#2489) The minimum batch size logic could cause prefix blocks to be deallocated without prefill. The next allocation of the same prefix would then use garbage blocks.
Author
Parents
Loading