CUDA BFloat16 addmm, addmv (#44986)
Summary:
This PR was originally authored by slayton58. I steal his implementation and added some tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44986
Reviewed By: mruberry
Differential Revision: D23806039
Pulled By: ngimel
fbshipit-source-id: 305d66029b426d8039fab3c3e011faf2bf87aead