DeepSpeed
e45b5e4c - ZeRO-Offload v1 (squash) (#345)

Commit
5 years ago
ZeRO-Offload v1 (squash) (#345) * update DSE to point to ZeRO-Offload staging * ZeRO-2 enable CPU offload (#313) * cpu-offload * update * deleted: deepspeed/pt/deepspeed_zero_optimizer_cpuoffload.py modified: deepspeed/pt/fp16_unfused_optimizer.py new file: install_output.txt modified: tests/unit/test_dynamic_loss_scale.py * modified: deepspeed/pt/deepspeed_zero_optimizer.py * update * modified: deepspeed/pt/deepspeed_cpu_adam.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: tests/unit/test_checkpointing.py modified: tests/unit/test_fp16.py * deleted: install_output.txt * modified: deepspeed/pt/fp16_unfused_optimizer.py modified: tests/unit/test_dynamic_loss_scale.py * modified: deepspeed/pt/deepspeed_cpu_adam.py * modified: deepspeed/pt/deepspeed_zero_optimizer.py * modified: deepspeed/pt/deepspeed_cpu_adam.py modified: deepspeed/pt/deepspeed_zero_optimizer.py * deleted: deepspeed_cpu_adam.py modified: deepspeed_light.py modified: deepspeed_zero_optimizer.py ../../deepspeed_zero_optimizer_cpu_offload.py * modified: deepspeed/pt/deepspeed_light.py * modified: deepspeed/pt/deepspeed_light.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: deepspeed/pt/deepspeed_zero_utils.py modified: tests/unit/test_fp16.py * modified: deepspeed/pt/deepspeed_config.py modified: deepspeed/pt/deepspeed_light.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: tests/unit/test_checkpointing.py modified: tests/unit/test_fp16.py * modified: deepspeed/pt/deepspeed_checkpointing.py * update DSE to ZeRO-Offload commit Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Enable ZeRO checkpointing for ZeRO-Offload (#337) * Enable ZeRO checkpointing for ZeRO-Offload Fix unit tests Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397 * Fix accidental revert * Add ZeRO-Offload checkpointing model tests (#344) * Enable ZeRO checkpointing for ZeRO-Offload Fix unit tests Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397 * Fix accidental revert * Fix ZeRO-Offload checkpointing bug when change gpu count Add checkpointing model tests for ZeRO-Offload Remove optimizer key from Megatron model tests Use different deepspeed master port for Megatron model tests Co-authored-by: Jie <37380896+jren73@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Committer
Parents
Loading