Add prefix sharing to continuous batching (#42094)
* Fix a bug in the CB memory calcuation
* Nit in example
* Replace _free_blocks with a proper object BlockManager
* Removed dead code
* Added hasing mechanism (wip)
* Added de-duplication
* Add de-initialization mechnaism
* Add prefix detection
* Ensure we always keep 1 token for decode start
* Removed some todos and small fix
* Update src/transformers/generation/continuous_batching/cache.py
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Update src/transformers/generation/continuous_batching/continuous_api.py
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* DOCSSSS
* Review comments
* Style
* Added a flag to allow prefix sharing
* [IMPORTANT] bug fix for prefix length memoization
* Added a test for Cb prefix sharing
* Example, start of refactor
* End of refactor for example script
* Added a do sample arg
* Added reporting on prefix sharing
* Added a context managr option for CB manager
* Nit and style
* Review comment from ArthurZucker
---------
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>