Update internal code for torch.lu_solve (#56611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56611
The goal of this refactoring is to make the `torch.linalg.solve`
to be a composition of calls to `lu_stub` and `lu_solve_stub`.
Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath,
`torch.linalg.solve` will have it as well.
Replaced lu_solve_helper with DECLARE_DISPATCH for lu_solve_stub.
Removed unnecessary copy improving the performance (see https://github.com/pytorch/pytorch/pull/56611#issuecomment-824303673).
Split MAGMA-based `apply_lu_solve` into `apply_lu_solve_looped_magma`
and `apply_lu_solve_batched_magma`. This simplifies future dispatch to
cuSOLVER and cuBLAS.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28142279
Pulled By: mruberry
fbshipit-source-id: 9d4baf650ca7a40b800616794408b34342d8d68f