transformers
47227f47 - Add prefix sharing to continuous batching (#42094)

Commit
75 days ago
Add prefix sharing to continuous batching (#42094) * Fix a bug in the CB memory calcuation * Nit in example * Replace _free_blocks with a proper object BlockManager * Removed dead code * Added hasing mechanism (wip) * Added de-duplication * Add de-initialization mechnaism * Add prefix detection * Ensure we always keep 1 token for decode start * Removed some todos and small fix * Update src/transformers/generation/continuous_batching/cache.py Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Update src/transformers/generation/continuous_batching/continuous_api.py Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * DOCSSSS * Review comments * Style * Added a flag to allow prefix sharing * [IMPORTANT] bug fix for prefix length memoization * Added a test for Cb prefix sharing * Example, start of refactor * End of refactor for example script * Added a do sample arg * Added reporting on prefix sharing * Added a context managr option for CB manager * Nit and style * Review comment from ArthurZucker --------- Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Author
Parents
Loading