DeepSpeed
52907a66 - stage3.py: do not scale if gradient_predivide_factor is 1.0 (#3630)

Commit
2 years ago
stage3.py: do not scale if gradient_predivide_factor is 1.0 (#3630) this change also aligns with the logic before reduce_scatter_coalesced Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading