DeepSpeed
[BUG] Fix: Fix gradient norm calculation and dynamic shape blocking in PP+ZeRO1 collective communication
#7847
Open

[BUG] Fix: Fix gradient norm calculation and dynamic shape blocking in PP+ZeRO1 collective communication #7847

Thinksky5124 wants to merge 6 commits into deepspeedai:master from Thinksky5124:master
Thinksky5124
Thinksky5124 Fix pp+zero1 bugs
99697687
Thinksky5124 Thinksky5124 requested a review from tjruwase tjruwase 123 days ago
Thinksky5124 Thinksky5124 requested a review from tohtana tohtana 123 days ago
Thinksky5124 Thinksky5124 requested a review from loadams loadams 123 days ago
chatgpt-codex-connector
chatgpt-codex-connector commented on 2026-02-12
Thinksky5124 Merge branch 'deepspeedai:master' into master
76d4b496
Thinksky5124 Fix: MoE grad norm per-group averaging and accelerator-compatible dev…
e4aea16d
tohtana
tohtana commented on 2026-03-24
Thinksky5124 Merge branch 'deepspeedai:master' into master
a32d8d92
Thinksky5124 fix bug
bc2dda63
Thinksky5124
Thinksky5124 Merge branch 'master' into master
f29d66df

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone