DeepSpeed
8afcda2a - ZeRO Gradient Accumulation Dtype. (#2847)

Comment changes are shownComment changes are hidden
Commit
1 year ago
ZeRO Gradient Accumulation Dtype. (#2847) * Adding attributes for grad accum dtype. * accumulating reduction grads in stage 2 mode 2 * missing colon * tracking reduc grad move * Correct hooks. * Name change updates. * Using grad_accum in cpu offload functions. * Addressing comments: putting bf opt back, removing hooks * Fixing missing pointer to grad accum. * Renaming functions. * More function renames. * Adding reduction dtype. * updating for offload * Adding functionality for stage 3. * Adding s3 test support. * Add to MiCS optimizer. * zero++ tutorial PR (#3783) * Removing need to grad_reduc attribute. * Offload correctness. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Heyang Qin <heyangqin@microsoft.com>
Author
Parents
  • deepspeed/runtime
    • File
      engine.py
    • zero
      • File
        mics.py
      • File
        stage3.py
      • File
        stage_1_and_2.py
  • tests/unit/runtime
    • File
      test_ds_initialize.py