transformers
9f7faed2 - [CB] [Major] Asynchronous batching (#43960)

Commit
2 days ago
[CB] [Major] Asynchronous batching (#43960) * Cleanup: batch is more self contained * Created utils.py file * Moved pad to utils * Pin memory for input and outputs * Consolidate inputs into a bulk tensor * Consolidated read and write indices * Add the transfer_inputs fn * Renames and getters * Remove useless sync * Move graphs to the IOs * Async done except for carry_in_ids * Add carry over (scheduler not picking up tho) * Remodeled scheduling * Fix carry over * Fix stream * Bumped _upper_bound_num_blocks * Faster compute for physical read indices * Final actual changes * Adress some todos * Rename input_outputs * Modify the behavior of async * Fix bugs * Added async tests * Fix test * Remodel example * Fix offload test * Fix real cause of offload fail * Nits * Propagate use_async * Performance fixes 1 * More flexibility for cuda graphs * Remodeled the read and write indices * Review compliance * More doc and beautifull ascii * Style * Fixes for end of generation * Review compliance * More tokens to pass test
Author
Parents
Loading