Add BFloat16 support to CPU nansum (#61083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61083
It's already supported on CUDA, so it seems reasonable to support on CPU as
well. This also changes `test_nansum` to compare against `torch.sum` since numpy
doesn't support BFloat16. Note that `test_nansum_vs_numpy` checks against
NumPy as well, so that's still being tested.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D30006227
Pulled By: heitorschueroff
fbshipit-source-id: 1449730e1936417e7de1f8b3cf8cdcc15518873c