DeepSpeed
stage3: efficient compute of scaled_global_grad_norm
#5256
Merged

Loading