DeepSpeed
54c06872 - stage3: efficient compute of scaled_global_grad_norm (#5256)

Commit
1 year ago
stage3: efficient compute of scaled_global_grad_norm (#5256) using torch.norm instead of inefficient for loop --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading