transformers
9f7faed2 - [CB] [Major] Asynchronous batching (#43960)

Commit

2 days ago

[CB] [Major] Asynchronous batching (#43960) * Cleanup: batch is more self contained * Created utils.py file * Moved pad to utils * Pin memory for input and outputs * Consolidate inputs into a bulk tensor * Consolidated read and write indices * Add the transfer_inputs fn * Renames and getters * Remove useless sync * Move graphs to the IOs * Async done except for carry_in_ids * Add carry over (scheduler not picking up tho) * Remodeled scheduling * Fix carry over * Fix stream * Bumped _upper_bound_num_blocks * Faster compute for physical read indices * Final actual changes * Adress some todos * Rename input_outputs * Modify the behavior of async * Fix bugs * Added async tests * Fix test * Remodel example * Fix offload test * Fix real cause of offload fail * Nits * Propagate use_async * Performance fixes 1 * More flexibility for cuda graphs * Remodeled the read and write indices * Review compliance * More doc and beautifull ascii * Style * Fixes for end of generation * Review compliance * More tokens to pass test

References

#43960 - [CB] [Major] Asynchronous batching

Author

remi-or

Parents

5ef42f53

transformers 9f7faed2 - [CB] [Major] Asynchronous batching (#43960)

transformers
9f7faed2 - [CB] [Major] Asynchronous batching (#43960)