DeepSpeed
44c51e34 - Fix Adam subgroup inconsistency (#7982)

Commit
19 days ago
Fix Adam subgroup inconsistency (#7982) Fix CPUAdam same-step subgroup drift in ZeRO-3 (#7819) This PR ports the fix from #7820 to the latest DeepSpeed version. It makes `Adam_Optimizer::IncrementStep` idempotent for repeated calls at the same logical step and avoids unnecessary recomputation when the step has not changed. ZeRO-3/SuperOffload can invoke multiple subgroup updates within a single logical step on a shared native optimizer object. The previous logic mixed multiply and recompute paths, producing non-bit-identical bias-correction metadata across subgroup calls. This change aligns the step-transition logic in both the CPU and XPU headers, clarifies first-step and non-sequential-step behavior, and prevents unnecessary work on repeated same-step updates. It also adds CPUAdam regression tests covering subgroup-style repeated same-step updates through both `step_subgroup()` and `step()` with parameter swapping. Signed-off-by: st_bang <st.bang@dgist.ac.kr>
Author
Parents
Loading