Enable tensor fragments for zero 2 & 3 (#2727)
* Enable tensor fragments for zero 2
* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Support offload
* Support multi-gpu
* Cleanup
* WIP
* Update deepspeed/runtime/zero/stage3.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Support padding
* Update deepspeed/runtime/zero/stage3.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* z3 optimizer state support; aligned api
* Support frozen z3 params
* Unit tests
* Check NVMe offload capability
* Formatting
* Docs
* More docs
* More docs
* Update docs/code-docs/source/zero3.rst
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* More docs
* Update docs/code-docs/source/zero3.rst
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* More docs
* More docs
* Update docs/code-docs/source/zero3.rst
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* More docs
* Support unsharded fp32 grad
* Remove debug prints
* Fix off-by-one detection of empty grads
* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update deepspeed/runtime/zero/stage3.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix off-by-one error
* Skip ranks with no gradient data
* Formatting
* Add license
* Fix license
---------
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>