DeepSpeed
90793aab - re-introduce: stage3: efficient compute of scaled_global_grad_norm (#5493)

Commit
1 year ago
re-introduce: stage3: efficient compute of scaled_global_grad_norm (#5493) reverting previous revert of this feature: https://github.com/nelyahu/DeepSpeed/commit/bc48371c5e1fb8fd70fc79285e66201dbb65679b in addition, bug fix for offload mode.
Author
Parents
Loading