DeepSpeed
729df6ca - fix(superoffload) preserve multi-group updates with shared cpu buffers (#7905) (#7906)

Commit

8 days ago

fix(superoffload) preserve multi-group updates with shared cpu buffers (#7905) (#7906) Fix [issue #7905](https://github.com/deepspeedai/DeepSpeed/issues/7905) - Preserve optimizer param-group metadata across ZeRO-3 subgroup splitting so SuperOffload handles multiple optimizer groups correctly. - Switch the CPU worker path to shared CPU parameter and gradient buffers, removing the need to send updated parameters back through the result queue. - Make the GPU-to-CPU gradient copy asynchronous and submit CPU optimizer work only after the copy is ready. The figures below compare per-iteration time and GPU memory usage against the non-offload. The second figure presents a correctness check of the updated version. <img width="977" height="364" alt="image" src="https://github.com/user-attachments/assets/8fb2cf21-1a8c-47dd-9090-ec73acc5c9dc" /> <img width="3248" height="1748" alt="image" src="https://github.com/user-attachments/assets/d8121d64-dfd9-478c-87ea-b41e98630a2a" /> --------- Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>

References

#7906 - fix(superoffload) preserve multi-group updates with shared cpu buffers (#7905)

Author

xylian86

Parents

62c3e6d8

DeepSpeed 729df6ca - fix(superoffload) preserve multi-group updates with shared cpu buffers (#7905) (#7906)

DeepSpeed
729df6ca - fix(superoffload) preserve multi-group updates with shared cpu buffers (#7905) (#7906)