DeepSpeed
00edd290 - Fix AutoEP + Muon compatibility for batched expert tensors

Commit
44 days ago
Fix AutoEP + Muon compatibility for batched expert tensors 1. gram_newtonschulz: replace torch.addmm (2D only) with equivalent a*Q + Z@Q to support batched 3D expert weight tensors [num_local_experts, n, m]. Also fix diagonal() to specify dim1/dim2 for 3D tensors. 2. deepseek_v3 preset: remove e_score_correction_bias from unsupported_router_bias_names since auto_ep_layer.py already copies it correctly (lines 398-402). Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Author
Parents
Loading