Enable cusolver potrf batched for Cholesky decomposition when cuda >= 11.3 (#57788)
Summary:
This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3.
Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases.
## cholesky dispatch heuristics:
### before:
- batch size == 1: cusolver potrf
- batch size > 1: magma xpotrf batched
### after:
cuda >= 11.3:
- batch size == 1: cusolver potrf
- batch size > 1: cusolver potrf batched
cuda < 11.3 (not changed):
- batch size == 1: cusolver potrf
- batch size > 1: magma xpotrf batched
---
See also https://github.com/pytorch/pytorch/issues/42666 #47953 https://github.com/pytorch/pytorch/issues/53104 #53879
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57788
Reviewed By: ngimel
Differential Revision: D28345530
Pulled By: mruberry
fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61