DeepSpeed
[Blog] Muon Optimizer Support in DeepSpeed
#7962
Merged

[Blog] Muon Optimizer Support in DeepSpeed #7962

delock merged 28 commits into master from gma/muon_blog
delock
delock delock force pushed from 62714543 to 5da5fad6 62 days ago
delock delock marked this pull request as ready for review 61 days ago
delock delock requested a review from loadams loadams 61 days ago
delock delock requested a review from tjruwase tjruwase 61 days ago
chatgpt-codex-connector
delock delock force pushed from d1b1497a to 79325337 61 days ago
delock
sfc-gh-truwase
sfc-gh-truwase commented on 2026-04-20
sfc-gh-truwase
sfc-gh-truwase commented on 2026-04-20
sfc-gh-truwase
sfc-gh-truwase commented on 2026-04-20
delock delock force pushed from 9aaa30af to 1ff51eb1 42 days ago
delock delock force pushed from 773f05fc to 52c9c296 42 days ago
delock
PKUWZP
PKUWZP commented on 2026-05-03
delock delock force pushed from 7d945589 to 454eb1a2 35 days ago
delock delock force pushed from b9f133f8 to d4f93344 29 days ago
delock Muon optimizer blog draft
8aa8f4f7
delock add contributor list
b79f2280
delock fix checkboxes
40d7eb61
delock expand memory analysis
4a3f2aae
delock trim down
a0f8653f
delock remove memory data
315055bc
delock fix formatting
b3a27209
delock delock force pushed from d4f93344 to b78d616b 26 days ago
delock fix gramma
e64b7aa3
delock Add convergence experiment result and fix typos in Muon blog
feddb764
delock Add training configuration caption to convergence chart
65c0010e
delock Update Muon blog with measured convergence and memory data
35f8e764
delock Update Muon blog future plan: mark ZeRO stage 3 and Gram NS as done
8d034c0f
delock Add Muon pretraining convergence advantage to What is Muon section
bb164d0c
delock Revamp future plan into What's Next with active roadmap tone
3dcc5a3a
delock Add GLM-5 as Muon adopter and fix What's Next roadmap
c1f89b16
delock Add Muon blog to Latest News in README and docs landing page
201ca711
delock Refine Muon blog: convergence results, LR tuning guide, and formattin…
d954e232
delock Update Muon blog: convergence results, citation fixes, and DeepSeek-V4
0085c95a
delock Add Peng Du (@pengdurice) to Muon blog contributors
36f78dbf
delock Remove eval loss curve from Muon blog
09018d55
delock Update Muon blog: final experiment results with tuned learning rate
49be0447
delock Fix metric count in Muon blog: 3 out of 4
b3c0d129
delock Fix improvement numbers in Muon blog: use absolute pp difference
84f6a87c
delock Update release date from April to May
e7275eb1
delock Reorder Muon blog above SDMA in README
a665878d
delock Add SDMA entry to docs/index.md
1eaa8009
delock delock force pushed from 4306f2d6 to 1eaa8009 26 days ago
delock Sync blog README with Google Doc edits
4f737d94
sfc-gh-truwase
sfc-gh-truwase approved these changes on 2026-05-20
delock Merge branch 'master' into gma/muon_blog
f65df29e
delock delock merged de473091 into master 21 days ago
delock delock deleted the gma/muon_blog branch 21 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone