torch-native pipeline parallelism for big models (#2345)
* Broken version
* Timing I would expect
* Working version!
* Use MethodType
* working test
* Tests
* Use no split module classes explicitly
* Put split_points in pipelien
* Store split points in hf_split_points
* fix case num_process=1
* Allow for dynamic batch padding (#2352)
* Allow for dynamic batch paddign
* Fix test
* Update src/accelerate/inference.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Break early after the first valid bs is found
* Less slicy-dicy
* Test cv model
* Start, need to test
* Use dataloader-like logic
* Refactor to utils
* With tests
* Update the source
* Clean
* bs=1 case
* Add test
* add some failing test
* Almost working version
* Much cleaner implementation
* Use pad_input_tensor
* All tests passing!
* Do it at tracing too
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
* Rm literal
* Allow users to pass in max_memory
* Note about recursion
* Document, document, document
* Right import check
* Fix bug, add tests to multigpu runners
* Change default to None
* Start of docs
* Try again?
* Try again x2
* Trailing comma
* Move import
* Clean
* typehint
* typo
* From code review
* Use num_chunks
* Update tests/test_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Bad copy/paste
* hf_split_points
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>