Dispatch to mv rather than mm in the case that tensor1.ndim == 1 and tensor2.ndim == 2
This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75195
Approved by: https://github.com/ezyang