DeepSpeed
7abc53a6 - Update Muon blog with measured convergence and memory data

Commit

32 days ago

Update Muon blog with measured convergence and memory data Replace placeholder claims with actual experiment results: - Add lr sweep results for both AdamW and Muon optimizers - Report measured GPU memory: AdamW 34.5 GiB vs Muon 31.4 GiB (9% savings) - Remove old convergence chart (adamw_vs_muon_3b.png) - Fix inaccurate claims (Muon 19% better, Adam OOM on 2xA100) - Add hybrid optimizer explanation and separate lr config docs Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>

References

#7962 - [Blog] Muon Optimizer Support in DeepSpeed

Author

delock

Committer

delock

Parents

8d9433e2

DeepSpeed 7abc53a6 - Update Muon blog with measured convergence and memory data

DeepSpeed
7abc53a6 - Update Muon blog with measured convergence and memory data