DeepSpeed
54c06872 - stage3: efficient compute of scaled_global_grad_norm (#5256)

Commit

2 years ago

stage3: efficient compute of scaled_global_grad_norm (#5256) using torch.norm instead of inefficient for loop --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

References

lekurile/ds_chat_test_54c06872

#5256 - stage3: efficient compute of scaled_global_grad_norm

Author

nelyahu

Parents

7b5b0660

DeepSpeed 54c06872 - stage3: efficient compute of scaled_global_grad_norm (#5256)

DeepSpeed
54c06872 - stage3: efficient compute of scaled_global_grad_norm (#5256)