text-generation-inference
deec30f8 - hotfix: avoid non-prefilled block use when using prefix caching (#2489)

Commit

1 year ago

hotfix: avoid non-prefilled block use when using prefix caching (#2489) The minimum batch size logic could cause prefix blocks to be deallocated without prefill. The next allocation of the same prefix would then use garbage blocks.

References

#2489 - hotfix: avoid non-prefilled block use when using prefix caching

Author

danieldk

Parents

6cb42f49

text-generation-inference deec30f8 - hotfix: avoid non-prefilled block use when using prefix caching (#2489)

text-generation-inference
deec30f8 - hotfix: avoid non-prefilled block use when using prefix caching (#2489)