transformers
34108a22 - Continuous batching refactor (#40426)

Commit
113 days ago
Continuous batching refactor (#40426) * Rework of the CB example * Further rework of CB example * Refactor PA cache, slice on tokens, add debug prints -- WIP * Slice cache -- WIP * Added a mechanism to check batched outputs in CB script * Less logging, debug flag for slice, !better reset! -- WIP * QOL and safety margins * Refactor and style * Better saving of cb example * Fix * Fixes and QOL * Mor einformations about metrics * Further logging * Style * Licenses * Removed some comments * Add a slice input flag * Fix in example * Added back some open-telemetry deps * Removed some aux function * Added FA2 option to example script * Fixed math (all of it) * Added a simple example * Renamed core to classes * Made allocation of attention mask optionnal * Style
Author
Parents
Loading