Port linalg_qr to structured

Commit

2 years ago

Port linalg_qr to structured This PR simplifies the logic of `linalg.qr` using structured kernels. I also took this chance and merged a few `copy_` operations with other ops. This PR removes a the previous magma implementation as is never faster than that of cusolver and it's rather buggy. This has the side-effect that now `qr` is not supported in Rocm. Ivan confirmed that this is fine, given how incredibly slow was QR on Rocm anyway (we were marking some tests as slow because of this...). This PR also corrects the dispatch in geqrf. Before, if we called it with a matrix for which `input.size(-2) <= 256 && batchCount(input) >= std::max<int64_t>(2, input.size(-2) / 16)` is false, and we have cublas but not cusolver, we would end up calling magma rather than cublas. This is not what the heuristic suggested. Probaly we should benchmark these heuristics again, but that's beyond the scope of this PR. Note. It looks like `torch.geqrf` maybe broken in MAGMA as per the previous comment in `linalg_qr_helper_magma`. IvanYashchuk wdyt? Pull Request resolved: https://github.com/pytorch/pytorch/pull/79054 Approved by: https://github.com/IvanYashchuk, https://github.com/ezyang

Author

lezcano

Committer

pytorchmergebot

Parents

2b6a0427

pytorch af6321f3 - Port linalg_qr to structured

pytorch
af6321f3 - Port linalg_qr to structured