text-generation-inference
deec30f8
- hotfix: avoid non-prefilled block use when using prefix caching (#2489)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
hotfix: avoid non-prefilled block use when using prefix caching (#2489) The minimum batch size logic could cause prefix blocks to be deallocated without prefill. The next allocation of the same prefix would then use garbage blocks.
References
#2489 - hotfix: avoid non-prefilled block use when using prefix caching
Author
danieldk
Parents
6cb42f49
Loading