[Foreach] Implement L1&L2 norm (#62646)
Summary:
Implement L1 & L2 norm in fast path with the reference of [nvidia/apex](https://github.com/NVIDIA/apex/blob/master/csrc/multi_tensor_l2norm_kernel.cu).
When `ord` is neither 1 nor 2, then slow path is chosen.
Related: https://github.com/pytorch/pytorch/issues/58833
cc ptrblck mcarilli ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62646
Reviewed By: malfet
Differential Revision: D32173421
Pulled By: ngimel
fbshipit-source-id: 14b7544601658a979b83509df351e1848ded7675