[ROCm] Add sparse mappings for CUDA->HIP translation (#67323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67323
Applied patch proposed by Jeff https://github.com/pytorch/pytorch/pull/63948#issuecomment-952166982.
In PyTorch, we map cuBLAS->rocBLAS and cuSPARSE->hipSPARSE. Note the prefix, roc versus hip.
The 'hip' APIs offer a more direct CUDA-friendly mapping, but calling rocBLAS directly has better performance.
Unfortunately, the `roc*` types and `hip*` types differ, i.e., `rocblas_float_complex` versus `hipComplex`.
In the case of SPARSE, we must use the hip types for complex instead of the roc types,
but the pytorch mappings assume roc. Therefore, we create a new SPARSE mapping that has a higher priority.
Its mappings will trigger first, and only when a miss occurs will the lower-priority pytorch mapping take place.
When a file contains "sparse" in the filename, a mapping marked with API_SPARSE is preferred over other choices.
cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D31969246
Pulled By: cpuhrsch
fbshipit-source-id: 4ce1b35eaf9ef0d146a0955ce70c354ddd8f4669