pytorch
4e110528 - Added cuSOLVER path for torch.linalg.eigh/eigvalsh (#53040)

Commit
5 years ago
Added cuSOLVER path for torch.linalg.eigh/eigvalsh (#53040) Summary: This PR adds the cuSOLVER based path for `torch.linalg.eigh/eigvalsh`. The device dispatching helper function was removed from native_functions.yml, it is replaced with `DECLARE/DEFINE_DISPATCH`. cuSOLVER is used if CUDA version >= 10.1.243. In addition if CUDA version >= 11.1 (cuSOLVER version >= 11.0) then the new 64-bit API is used. I compared cuSOLVER's `syevd` vs MAGMA's `syevd`. cuSOLVER is faster than MAGMA for all matrix sizes. I also compared cuSOLVER's `syevj` (Jacobi algorithm) vs `syevd` (QR based divide-and-conquer algorithm). Despite it is said that `syevj` is better than `syevd` for smaller matrices, in my tests it is the case only for float32 dtype and matrix sizes 32x32 - 512x512. For batched inputs comparing a for loop of `syevd/syevj` calls to `syevjBatched` shows that for batches of matrices up to 32x32 the batched routine is much better. However, there are bugs in `syevjBatched`, sometimes it doesn't compute the result leaving eigenvectors as a unit diagonal matrix and eigenvalues as the real diagonal of the input matrix. The output is the same with `cupy.cusolver.syevj` so the problem is definitely on the cuSOLVER side. This bug is not present in the non-batched `syevj`. The performance of 64-bit `syevd` is the same as 32-bit version. Ref. https://github.com/pytorch/pytorch/issues/47953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53040 Reviewed By: H-Huang Differential Revision: D27401218 Pulled By: mruberry fbshipit-source-id: aef91eefb57ed73fef87774ff9a36d50779903f7
Author
Parents
Loading