Improve heuristics for linalg_lu_solve when B is a matrix (#79838)
When linalg_lu_solve was added in
https://github.com/pytorch/pytorch/pull/72935 I made the big mistake of
assuming that the choice of backend would not depend on number of
columns of B. This turned out to be false by a large margin.
This PR amends this and provides a heuristic that takes the number of
columns of B into account. The heuristic is not simple and it was
crafted by hand, but as the results show, it is effective.
@xwang233 the cusolver team should look into this one, as I was able to
outperform both cublas and cusolvers algorithms by using triangular
solves...
The benchmarks for the heuristics are here: https://github.com/pytorch/pytorch/pull/79838#issuecomment-1163802792
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79838
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD