Make `torch.lu` support complex input on CUDA. (#45898)
Summary:
As per title. LU decomposition is used for computing determinants, and I need this functionality to implement the matrix square root. Next PR on my list is to enable `torch.det` on CUDA with complex input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45898
Reviewed By: heitorschueroff
Differential Revision: D24306951
Pulled By: anjali411
fbshipit-source-id: 168f578fe65ae1b978617a66741aa27e72b2172b