Enable HgemmBatched for ROCm (#37483)
Summary:
The purpose of this PR is to enable HgemmBatched for ROCm. Since the inconsistency between CUDA_VERSION and HIP_VERSION, resulting in THCudaBlas_HgemmStridedBatched() not to be called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37483
Differential Revision: D21395699
Pulled By: ngimel
fbshipit-source-id: c5c22d5f2041d4c9911558b2568fc9ce33ddeb5d