Improve `clip_grad_norm` to use torch.linalg.vector_norm (#102429)
Done in this PR:
- Use `torch.linalg.vector_norm` instead of `torch.norm`
- Reduce bandwidth boundary of clip_grad_norm when used with `inf`, ie no need to get the returned tensor after `abs`
What I'm slightly unsure:
- I don't know if `inf` support `torch._foreach` API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102429
Approved by: https://github.com/lezcano