ZeRO-Offload (squash) (#381)

Commit

5 years ago

ZeRO-Offload (squash) (#381) * ZeRO-Offload v1 (squash) (#345) * update DSE to point to ZeRO-Offload staging * ZeRO-2 enable CPU offload (#313) * cpu-offload * update * deleted: deepspeed/pt/deepspeed_zero_optimizer_cpuoffload.py modified: deepspeed/pt/fp16_unfused_optimizer.py new file: install_output.txt modified: tests/unit/test_dynamic_loss_scale.py * modified: deepspeed/pt/deepspeed_zero_optimizer.py * update * modified: deepspeed/pt/deepspeed_cpu_adam.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: tests/unit/test_checkpointing.py modified: tests/unit/test_fp16.py * deleted: install_output.txt * modified: deepspeed/pt/fp16_unfused_optimizer.py modified: tests/unit/test_dynamic_loss_scale.py * modified: deepspeed/pt/deepspeed_cpu_adam.py * modified: deepspeed/pt/deepspeed_zero_optimizer.py * modified: deepspeed/pt/deepspeed_cpu_adam.py modified: deepspeed/pt/deepspeed_zero_optimizer.py * deleted: deepspeed_cpu_adam.py modified: deepspeed_light.py modified: deepspeed_zero_optimizer.py ../../deepspeed_zero_optimizer_cpu_offload.py * modified: deepspeed/pt/deepspeed_light.py * modified: deepspeed/pt/deepspeed_light.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: deepspeed/pt/deepspeed_zero_utils.py modified: tests/unit/test_fp16.py * modified: deepspeed/pt/deepspeed_config.py modified: deepspeed/pt/deepspeed_light.py modified: deepspeed/pt/deepspeed_zero_optimizer.py modified: tests/unit/test_checkpointing.py modified: tests/unit/test_fp16.py * modified: deepspeed/pt/deepspeed_checkpointing.py * update DSE to ZeRO-Offload commit Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Enable ZeRO checkpointing for ZeRO-Offload (#337) * Enable ZeRO checkpointing for ZeRO-Offload Fix unit tests Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397 * Fix accidental revert * Add ZeRO-Offload checkpointing model tests (#344) * Enable ZeRO checkpointing for ZeRO-Offload Fix unit tests Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397 * Fix accidental revert * Fix ZeRO-Offload checkpointing bug when change gpu count Add checkpointing model tests for ZeRO-Offload Remove optimizer key from Megatron model tests Use different deepspeed master port for Megatron model tests Co-authored-by: Jie <37380896+jren73@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * update DSE to staging for zero-dual * Update test_sparse_attention.py * Assert ZeRO-Offload+gradient accumulation (#347) * Adding link to Sparse Attention in Navigation page (#355) * adding link to Sparse Attention in Navigation page * Correctness and perf fixes (#354) * Update test_sparse_attention.py * jren changes * Merge with correctness/perf fixes * Formatting fixes Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * add cpu adam optimizer (#356) * add cpu adam optimizer * run precommit * clean adam_test * add accuracy test for adam * make the adam unit test work with random params and grads and for more steps * Samyamr/zero offload correctness (#359) * fixing gradient accumulation for zero offload * Bug fixes. ZeRO Stage 1,2 and Offload all produce the same loss with gradient accumulation step of 2 * Import path fixes + conditional imports (#358) * use relative imports and add support for conditional op imports * formatting and llvm command check change * fix remaining absolute import * hide the isntalled ops var * fix unit tests Co-authored-by: Reza Yazdani <reyazda@microsoft.com> * Enable contiguous gradients for cpu_offload * Allocating CPU memory directly on CPU without transfering them from GPU (#360) * Allocating CPU memory directly on CPU without transfering them from GPU * formatting fixes * change gpt2 pretrain to have DeepSpeed adam (#361) Co-authored-by: Reza Yazdani <reyazda@microsoft.com> * Jekyll installation instructions (#351) * Generalize detection of ZeRO supported optimizers (#349) * Improve test for ZeRO supported optimizers * Rename test function * Format fixes * Add model tests that wraps client FusedAdam with fused fp16 optimizer * Format fixes * everything is working * fixing the cpu_adam API and add deepspeed_adam flag in config.py (#365) * fixing the cpu_adam API and add deepspeed_adam flag in config.py * run precommit * fixing adam copy fp16-param-add more compile flags for cpu_adam * run precommit * fix variance indexes * fix array-sizes * ZeRO-Offload passing model functionality tests (#366) * cpu_offload enables overlap_comm and contiguous_gradients Remove non-portable tensor.mul_() * Model functionality tests now passing * Move to perf tests folder * move adam_test * rename perf test * fixing adam copy fp16-param and add more compile flags for cpu_adam (#367) * fixing adam copy fp16-param-add more compile flags for cpu_adam * run precommit * fix variance indexes * fix array-sizes * move adam_test * rename perf test * Perf tests * BumpDSE * fixed a typo; this was fixed before but seems like it has been lost in the refactor (#364) * Move code quality tests to Azure-hosted agents. (#368) * add casting kernel * run precommit * revert changes * revert changes * ZeRO-Offload: Integration code fixes (#370) * Various correctness fixes * Format fixes * Update installation instructions (#362) * Update Sparse Attention Tutorial (#357) * adding BingSqaud e2e test * updating the draft test; bring final step under try section * finalizinf test for base deepspeed and deepspeed with ZeRO * applying the comment (thanks Jeff); fixed formatting * update Sparse Attention Tutorial * fixed few issues and applied comments for better organization and readability * updated sparse attention tutorial with making how to use section incremental; applying more comments Co-authored-by: arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com> * fixing corner cases (#371) * fix adam perormance (#372) * fixing corner cases * revert to the previous perf for adam * adam high performance * run precommit * ZeRO-Offload passing model tests (#374) * Add ZeRO-Offload model tests Restrict optimizer update+copy to DeepSpeedCPUAdam * Format fixes * Increate bucket size scaler * fix cpu adam compilation for AVX2 (#378) * fixing the compilation error for AVX2 architecture * running precommit * adding cpufeature to requirements * Update install.sh * Update install.sh * include cpu-adam in the features * update features * update features Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Move code quality tests to Azure-hosted agents. (#368) * Bump DSE * adding sparse attention to feature index page (#377) * support avx2 by default (#383) * add DS_BUILD_AVX512 flag and update the feature part accordingly * run precommit Co-authored-by: Jie <37380896+jren73@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Arash Ashari <arashari@microsoft.com> Co-authored-by: RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com>

References

#391 - ZeRO-Offload release

Author

jeffra

Committer

jeffra

Parents

01726ce2

DeepSpeed ad423f8f - ZeRO-Offload (squash) (#381)

DeepSpeed
ad423f8f - ZeRO-Offload (squash) (#381)