Add bfloat16 support in linear algebra on ROCm (#27719)
Summary:
This adds support for gemm-style matrix multiplications with data and output in bf16 to PyTorch on ROCm to the backend (i.e., bgemm).
Enable operators depending on bgemm.
With this change, bf16 matrices on ROCm can be multiplied on the GPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27719
Differential Revision: D18653514
Pulled By: bddppq
fbshipit-source-id: 805db923579bec6fc8fd1c51eeb5b1ef85a96758