ZeRO 3 Offload (#834)
* Squash stage3 v1 (#146)
Co-authored-by: Samyam <samyamr@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
* Fix correctness bug (#147)
* formatting fix (#150)
* stage3 bugfix (API) update and simplified FP16 Z3 tests (#151)
* fp16 Z3 API update and bugfix
* revert debug change
* ZeRO-3 detach and race condition bugfixes (#149)
* trying out ZeRO-3 race condition fix
* CUDA sync instead of stream
* reduction stream sync
* remove commented code
* Fix optimizer state_dict KeyError (#148)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152)
* Simplifying the logic for getting averaged gradients (#153)
* skip for now
* Z3 Docs redux (#154)
* removing some TODOs and commented code (#155)
* New Z3 defaults (#156)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* formatting
* megatron external params
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>