accelerate
0867c093 - torch-native pipeline parallelism for big models (#2345)

Commit

2 years ago

torch-native pipeline parallelism for big models (#2345) * Broken version * Timing I would expect * Working version! * Use MethodType * working test * Tests * Use no split module classes explicitly * Put split_points in pipelien * Store split points in hf_split_points * fix case num_process=1 * Allow for dynamic batch padding (#2352) * Allow for dynamic batch paddign * Fix test * Update src/accelerate/inference.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Break early after the first valid bs is found * Less slicy-dicy * Test cv model * Start, need to test * Use dataloader-like logic * Refactor to utils * With tests * Update the source * Clean * bs=1 case * Add test * add some failing test * Almost working version * Much cleaner implementation * Use pad_input_tensor * All tests passing! * Do it at tracing too --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co> * Rm literal * Allow users to pass in max_memory * Note about recursion * Document, document, document * Right import check * Fix bug, add tests to multigpu runners * Change default to None * Start of docs * Try again? * Try again x2 * Trailing comma * Move import * Clean * typehint * typo * From code review * Use num_chunks * Update tests/test_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Bad copy/paste * hf_split_points --------- Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

References

#2345 - torch-native pipeline parallelism for big models

Author

muellerzr

Parents

0e1ee4b9

accelerate 0867c093 - torch-native pipeline parallelism for big models (#2345)

accelerate
0867c093 - torch-native pipeline parallelism for big models (#2345)