DeepSpeed
Update sharded_moe.py to support top2 gate with Tutel
#6948
Open

Loading