Sparse CSR CUDA: Support mixed memory format input for triangular_solve (#66401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66401
This PR fixes the case when result and input tensors have different
strides.
cuSPARSE from CUDA 11.3.1 has a bug: it doesn't use correct strides to
write the result. This is "fixed" in PyTorch code by copying the input
tensor to a tensor with same strides as result tensor has.
cc nikitaved pearu cpuhrsch IvanYashchuk ngimel
Test Plan: Imported from OSS
Reviewed By: davidberard98
Differential Revision: D32177966
Pulled By: cpuhrsch
fbshipit-source-id: 118437409df147f04dce02763aff9bfd33f87c63