Modify "gemm" code to enable access to "sbgemm_" routine in OpenBLAS (#58831)
Summary:
OpenBLAS recently added support for bfloat16 GEMM, so this change has PyTorch call out to OpenBLAS for that, like it does for single and double precision
Our goal is to try to enable PyTorch to make calls to "sbgemm" in OpenBLAS.
We are prepared (if it is your preference) to add fences to the code to limit this change to the Power architecture,
but our first instinct is that anyone on any architecture that enables access to sbgemm in their OpenBLAS library
should be able to use this code. (but again, we respect that as we are just starting to modify PyTorch, we respect
your guidance!)
(there is no issue number related to this)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58831
Reviewed By: albanD
Differential Revision: D29951900
Pulled By: malfet
fbshipit-source-id: 3d0a4a638ac95b2ff2e9f6d08827772e28d397c3