DeepSpeed
No Muon optimizer for embeding and lm_head layer
#7641
Merged

Commits
  • filter out embed layer and lm_head layer from Muon optimizer
    delock committed 163 days ago
Loading