ZeRO-Offload (squash) (#381)
* ZeRO-Offload v1 (squash) (#345)
* update DSE to point to ZeRO-Offload staging
* ZeRO-2 enable CPU offload (#313)
* cpu-offload
* update
* deleted: deepspeed/pt/deepspeed_zero_optimizer_cpuoffload.py
modified: deepspeed/pt/fp16_unfused_optimizer.py
new file: install_output.txt
modified: tests/unit/test_dynamic_loss_scale.py
* modified: deepspeed/pt/deepspeed_zero_optimizer.py
* update
* modified: deepspeed/pt/deepspeed_cpu_adam.py
modified: deepspeed/pt/deepspeed_zero_optimizer.py
modified: tests/unit/test_checkpointing.py
modified: tests/unit/test_fp16.py
* deleted: install_output.txt
* modified: deepspeed/pt/fp16_unfused_optimizer.py
modified: tests/unit/test_dynamic_loss_scale.py
* modified: deepspeed/pt/deepspeed_cpu_adam.py
* modified: deepspeed/pt/deepspeed_zero_optimizer.py
* modified: deepspeed/pt/deepspeed_cpu_adam.py
modified: deepspeed/pt/deepspeed_zero_optimizer.py
* deleted: deepspeed_cpu_adam.py
modified: deepspeed_light.py
modified: deepspeed_zero_optimizer.py
../../deepspeed_zero_optimizer_cpu_offload.py
* modified: deepspeed/pt/deepspeed_light.py
* modified: deepspeed/pt/deepspeed_light.py
modified: deepspeed/pt/deepspeed_zero_optimizer.py
modified: deepspeed/pt/deepspeed_zero_utils.py
modified: tests/unit/test_fp16.py
* modified: deepspeed/pt/deepspeed_config.py
modified: deepspeed/pt/deepspeed_light.py
modified: deepspeed/pt/deepspeed_zero_optimizer.py
modified: tests/unit/test_checkpointing.py
modified: tests/unit/test_fp16.py
* modified: deepspeed/pt/deepspeed_checkpointing.py
* update DSE to ZeRO-Offload commit
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Enable ZeRO checkpointing for ZeRO-Offload (#337)
* Enable ZeRO checkpointing for ZeRO-Offload
Fix unit tests
Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397
* Fix accidental revert
* Add ZeRO-Offload checkpointing model tests (#344)
* Enable ZeRO checkpointing for ZeRO-Offload
Fix unit tests
Bump DSE to 33b9fb77c8cecdb49118188890f662526d8e9397
* Fix accidental revert
* Fix ZeRO-Offload checkpointing bug when change gpu count
Add checkpointing model tests for ZeRO-Offload
Remove optimizer key from Megatron model tests
Use different deepspeed master port for Megatron model tests
Co-authored-by: Jie <37380896+jren73@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* update DSE to staging for zero-dual
* Update test_sparse_attention.py
* Assert ZeRO-Offload+gradient accumulation (#347)
* Adding link to Sparse Attention in Navigation page (#355)
* adding link to Sparse Attention in Navigation page
* Correctness and perf fixes (#354)
* Update test_sparse_attention.py
* jren changes
* Merge with correctness/perf fixes
* Formatting fixes
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* add cpu adam optimizer (#356)
* add cpu adam optimizer
* run precommit
* clean adam_test
* add accuracy test for adam
* make the adam unit test work with random params and grads and for more steps
* Samyamr/zero offload correctness (#359)
* fixing gradient accumulation for zero offload
* Bug fixes. ZeRO Stage 1,2 and Offload all produce the same loss with gradient accumulation step of 2
* Import path fixes + conditional imports (#358)
* use relative imports and add support for conditional op imports
* formatting and llvm command check change
* fix remaining absolute import
* hide the isntalled ops var
* fix unit tests
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
* Enable contiguous gradients for cpu_offload
* Allocating CPU memory directly on CPU without transfering them from GPU (#360)
* Allocating CPU memory directly on CPU without transfering them from GPU
* formatting fixes
* change gpt2 pretrain to have DeepSpeed adam (#361)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
* Jekyll installation instructions (#351)
* Generalize detection of ZeRO supported optimizers (#349)
* Improve test for ZeRO supported optimizers
* Rename test function
* Format fixes
* Add model tests that wraps client FusedAdam with fused fp16 optimizer
* Format fixes
* everything is working
* fixing the cpu_adam API and add deepspeed_adam flag in config.py (#365)
* fixing the cpu_adam API and add deepspeed_adam flag in config.py
* run precommit
* fixing adam copy fp16-param-add more compile flags for cpu_adam
* run precommit
* fix variance indexes
* fix array-sizes
* ZeRO-Offload passing model functionality tests (#366)
* cpu_offload enables overlap_comm and contiguous_gradients
Remove non-portable tensor.mul_()
* Model functionality tests now passing
* Move to perf tests folder
* move adam_test
* rename perf test
* fixing adam copy fp16-param and add more compile flags for cpu_adam (#367)
* fixing adam copy fp16-param-add more compile flags for cpu_adam
* run precommit
* fix variance indexes
* fix array-sizes
* move adam_test
* rename perf test
* Perf tests
* BumpDSE
* fixed a typo; this was fixed before but seems like it has been lost in the refactor (#364)
* Move code quality tests to Azure-hosted agents. (#368)
* add casting kernel
* run precommit
* revert changes
* revert changes
* ZeRO-Offload: Integration code fixes (#370)
* Various correctness fixes
* Format fixes
* Update installation instructions (#362)
* Update Sparse Attention Tutorial (#357)
* adding BingSqaud e2e test
* updating the draft test; bring final step under try section
* finalizinf test for base deepspeed and deepspeed with ZeRO
* applying the comment (thanks Jeff); fixed formatting
* update Sparse Attention Tutorial
* fixed few issues and applied comments for better organization and readability
* updated sparse attention tutorial with making how to use section incremental; applying more comments
Co-authored-by: arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com>
* fixing corner cases (#371)
* fix adam perormance (#372)
* fixing corner cases
* revert to the previous perf for adam
* adam high performance
* run precommit
* ZeRO-Offload passing model tests (#374)
* Add ZeRO-Offload model tests
Restrict optimizer update+copy to DeepSpeedCPUAdam
* Format fixes
* Increate bucket size scaler
* fix cpu adam compilation for AVX2 (#378)
* fixing the compilation error for AVX2 architecture
* running precommit
* adding cpufeature to requirements
* Update install.sh
* Update install.sh
* include cpu-adam in the features
* update features
* update features
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Move code quality tests to Azure-hosted agents. (#368)
* Bump DSE
* adding sparse attention to feature index page (#377)
* support avx2 by default (#383)
* add DS_BUILD_AVX512 flag and update the feature part accordingly
* run precommit
Co-authored-by: Jie <37380896+jren73@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com>