[BUG] Fix: Fix gradient norm calculation and dynamic shape blocking in PP+ZeRO1 collective communication #7847
Fix pp+zero1 bugs
99697687
Merge branch 'deepspeedai:master' into master
76d4b496
Fix: MoE grad norm per-group averaging and accelerator-compatible dev…
e4aea16d
Merge branch 'deepspeedai:master' into master
a32d8d92
fix bug
bc2dda63
Merge branch 'master' into master
f29d66df
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub