DeepSpeed
00edd290 - Fix AutoEP + Muon compatibility for batched expert tensors

Commit

44 days ago

Fix AutoEP + Muon compatibility for batched expert tensors 1. gram_newtonschulz: replace torch.addmm (2D only) with equivalent a*Q + Z@Q to support batched 3D expert weight tensors [num_local_experts, n, m]. Also fix diagonal() to specify dim1/dim2 for 3D tensors. 2. deepseek_v3 preset: remove e_score_correction_bias from unsupported_router_bias_names since auto_ep_layer.py already copies it correctly (lines 398-402). Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>

References

gma/autoep-muon-fixes

#7938 - Add AutoEP

Author

delock

Parents

dde07ab8

DeepSpeed 00edd290 - Fix AutoEP + Muon compatibility for batched expert tensors

DeepSpeed
00edd290 - Fix AutoEP + Muon compatibility for batched expert tensors