DeepSpeed
Bug fix for norm calculation in absence of model parallel group
#551
Merged

Loading