DeepSpeed
allow seperate learning rate "muon_lr" and "adam_lr" for muon optimizer
#7658
Merged

Loading