Fix Adam subgroup inconsistency (#7982)
Fix CPUAdam same-step subgroup drift in ZeRO-3 (#7819)
This PR ports the fix from #7820 to the latest DeepSpeed version.
It makes `Adam_Optimizer::IncrementStep` idempotent for repeated calls
at the same logical step and avoids unnecessary recomputation when the
step has not changed.
ZeRO-3/SuperOffload can invoke multiple subgroup updates within a single
logical step on a shared native optimizer object. The previous logic
mixed multiply and recompute paths, producing non-bit-identical
bias-correction metadata across subgroup calls.
This change aligns the step-transition logic in both the CPU and XPU
headers, clarifies first-step and non-sequential-step behavior, and
prevents unnecessary work on repeated same-step updates.
It also adds CPUAdam regression tests covering subgroup-style repeated
same-step updates through both `step_subgroup()` and `step()` with
parameter swapping.
Signed-off-by: st_bang <st.bang@dgist.ac.kr>