Fix regression in triangular_solve when number of batches = 1 for CUDA (#23953)
Summary:
Changelog:
- When number of batches = 1, dispatch to trsm instead of trsm_batched in MAGMA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23953
Test Plan: - All triangular_solve tests should pass to ensure that the change is valid
Differential Revision: D16732590
Pulled By: ezyang
fbshipit-source-id: 7bbdcf6daff8a1af905df890a458ddfedc01ceaf