Add TORCH_CUDA_CU_API to CUDABlas functions (take 2) (#72340)
Summary:
In order to make them accessible from other libraries
Also make `REGISTER_ARCH_DISPATCH` export dispatches as TORCH_API, so that stubs could be called from libraries other than `torch_cpu`. To satisfy Windows builds, add the same `TORCH_API` to the static members declarations, although they are noops on Linux.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72340
Reviewed By: janeyx99
Differential Revision: D34007756
Pulled By: malfet
fbshipit-source-id: 6dcc4e350920c72f8b1762a5018082f7aeec98e9
(cherry picked from commit 9c1f44df8a957d93cb25eabedc5f94889bb7a007)