Migrate 'torch.dot' from TH to Aten (CUDA) (#40646)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40646
Support double, float, at::Half.
Avoid creating output result on CPU.
Both of two tensors must be on GPU.
Reviewed By: ngimel
Differential Revision: D22258840
fbshipit-source-id: 95f4747477f09b40b1d682cd1f76e4c2ba28c452