Port `mm` cuda from TH to ATen (#34891)
Summary:
Issue https://github.com/pytorch/pytorch/issues/24596
This PR moves `mm` cuda to ATen. The internal `addmmImpl` that was used as the base of the old TH version of `mm` cuda is also ported.
This PR also sets up `addmm` cuda to be fairly easily ported to ATen in a future PR, since TH `mm` and `addmm` used the same `addmmImpl` function at their core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34891
Differential Revision: D20650713
Pulled By: ngimel
fbshipit-source-id: 692aba1bbae65a18d23855b5e101446082d64c66