ZeRO-Offload v1 (squash) (#345)
* update DSE to point to ZeRO-Offload staging
* ZeRO-2 enable CPU offload (#313)
* cpu-offload
* update
* deleted: deepspeed/pt/deepspeed_zero_optimizer_cpuoffload.py
modified: deepspeed/pt/fp16_unfused_optimizer.py
new file: install_output.txt
modified: tests/unit/test_dynamic_loss_scale.py
* modified: deepspeed/pt/deepspeed_zero_optimizer.py
* update
* modified: deepspeed/pt/deepspeed_cpu_adam.py
modified: deepspeed/pt/deepspeed_zero_optimizer.py
modified: tests/unit/test_checkpointing.py
modified: tests/unit/test_fp16.py
* deleted: install_output.txt
* modified: deepspeed/pt/fp16_unfused_optimizer.py
modified: tests/unit/test_dynamic_loss_scale.py
* modified: deepspeed/pt/deepspeed_cpu_adam.py
* modified: deepspeed/pt/deepspeed_zero_optimizer.py
* modified: deepspeed/pt/deepspeed_cpu_adam.py
modified: deepspeed/pt/deepspeed_zero_optimizer.py
* deleted: deepspeed_cpu_adam.py
modified: deepspeed_light.py
modified: deepspeed_zero_optimizer.py
../../deepspeed_zero_optimizer_cpu_offload.py
* modified: deepspeed/pt/deepspeed_light.py
* modified: deepspeed/pt/deepspeed_light.py
modified: deepspeed/pt/deepspeed_zero_optimizer.py
modified: deepspeed/pt/deepspeed_zero_utils.py
modified: tests/unit/test_fp16.py
* modified: deepspeed/pt/deepspeed_config.py
modified: deepspeed/pt/deepspeed_light.py
modified: deepspeed/pt/deepspeed_zero_optimizer.py
modified: tests/unit/test_checkpointing.py
modified: tests/unit/test_fp16.py
* modified: deepspeed/pt/deepspeed_checkpointing.py
* update DSE to ZeRO-Offload commit
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Enable ZeRO checkpointing for ZeRO-Offload (#337)
* Enable ZeRO checkpointing for ZeRO-Offload
Fix unit tests
Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397
* Fix accidental revert
* Add ZeRO-Offload checkpointing model tests (#344)
* Enable ZeRO checkpointing for ZeRO-Offload
Fix unit tests
Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397
* Fix accidental revert
* Fix ZeRO-Offload checkpointing bug when change gpu count
Add checkpointing model tests for ZeRO-Offload
Remove optimizer key from Megatron model tests
Use different deepspeed master port for Megatron model tests
Co-authored-by: Jie <37380896+jren73@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>