Add TORCH_CUDA_CU_API to CUDABlas functions (#72305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72305
In order to make them accessible
Also make `REGISTER_ARCH_DISPATCH` export dispatches as TORCH_API, so that stubs could be called from libraries other than `torch_cpu`
Test Plan: Imported from OSS
Reviewed By: dagitses, ngimel
Differential Revision: D33992798
Pulled By: malfet
fbshipit-source-id: baf08d7c704b01fe7d692cebe12b017da5ce98ff
(cherry picked from commit 97b94e809531927654cd355427ba66785a4e20e9)