DeepSpeed
No Muon optimizer for embeding and lm_head layer
#7641
Merged

Loading