Add cusolver gesvdj and gesvdjBatched to the backend of torch.svd (#48436)
Summary:
This PR adds cusolver `gesvdj` and `gesvdjBatched` to the backend of `torch.svd`.
I've tested the performance using cuda 11.1 on 2070, V100, and A100. The cusolver gesvdj and gesvdjBatched performances are better than magma in all square matrix cases. So cusolver backend will replace magma backend when available.
When both matrix dimensions are no greater than 32, `gesvdjBatched` is used. Otherwise, `gesvdj` is used.
Detailed benchmark is available at https://github.com/xwang233/code-snippet/tree/master/svd.
Some relevant code and discussions
- https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/linalg/svd_op_gpu.cu.cc
- https://github.com/google/jax/blob/master/jaxlib/cusolver.cc
- https://github.com/cupy/cupy/issues/3174
- https://github.com/tensorflow/tensorflow/issues/13603
- https://www.nvidia.com/en-us/on-demand/session/gtcsiliconvalley2019-s9226/
See also https://github.com/pytorch/pytorch/issues/42666 https://github.com/pytorch/pytorch/issues/47953
Close https://github.com/pytorch/pytorch/pull/50516
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48436
Reviewed By: ejguan
Differential Revision: D25977046
Pulled By: heitorschueroff
fbshipit-source-id: c27e705cd29b6fd7c8ac674c1f9f490fa26ee1bf