CUDA BFloat16 batchnorm (non-cuDNN) (#44994)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44994
Reviewed By: ailzhang
Differential Revision: D25377525
Pulled By: ngimel
fbshipit-source-id: 42d583bbc364532264a4d3ebaa6b4ae02a0413de