SuperOffload Release (#7559)

Commit

195 days ago

SuperOffload Release (#7559) This PR introduces **SuperOffload**—an optimizer designed for Superchips (Nvidia GH200 & GB200, AMD MI300A) with high CPU–GPU bandwidth. It enables **full fine-tuning** of **GPT-OSS-20B, Qwen3-14B, and Phi-4** on a single GH200 GPU, achieving up to **~500 TFLOPS**, using Hugging Face Transformers and DeepSpeed—no custom modeling code required. SuperOffload extends ZeRO-Offload with fine-grained control and CPUAdam rollback utilities, allowing GPU execution to overlap with CPUAdam. This reduces GPU idle time and improves overall efficiency. Key changes: - New SuperOffloadOptimizer_Stage3 optimizer. - C++/CUDA binding for adam_rollback to revert one optimization step. - Config additions including super_offload and cpuadam_cores_perc. A detailed blog and tutorial will be available soon. --------- Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>

References

#7559 - SuperOffload Release

Author

xylian86

Parents

17d80ce4

DeepSpeed af56ed4d - SuperOffload Release (#7559)

DeepSpeed
af56ed4d - SuperOffload Release (#7559)