Refactor the API of the matmul implementation
Previously we used an odd overload using `c10::optional` to implement
the matmul logic of `matmul` and `matmul_out` simultaneously. This made
some functions (those in `linalg.matrix_exp`) call into this native::matmul
implementation, rather than going through the dispatcher.
In this PR we remove the use of `c10::optional` and rename the
implementation of matmul, to make sure that no one mistakenly calls it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75194
Approved by: https://github.com/ezyang, https://github.com/ngimel