DeepSpeed
[MoE] Fix misuse of num_experts as expert parallel group size (ep_size)
#7551
Merged

Loading