Added CUDA support for complex input for torch.triangular_solve (#46916)
Summary:
`torch.triangular_solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.
Ref. https://github.com/pytorch/pytorch/issues/33152
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46916
Reviewed By: navahgar, agolynski
Differential Revision: D24706647
Pulled By: anjali411
fbshipit-source-id: fe780eac93d2ae1b2549539bb385e5fac25213b3