CUDA BF16 norm (#48806)
Summary:
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48806
Reviewed By: mruberry
Differential Revision: D25358465
Pulled By: ngimel
fbshipit-source-id: 1a2afd86f39e96db0754d04bf81de045b1e1235c