[Relanding] Implemented torch.linalg.multi_dot (#52859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52859
This reverts commit 92a4ee1cf6092dd941591f80885eb7fef5b2c0d8.
Added support for bfloat16 for CUDA 11 and removed fast-path for empty input tensors that was affecting autograd graph.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D27402390
Pulled By: heitorschueroff
fbshipit-source-id: 73c5ccf54f3da3d29eb63c9ed3601e2fe6951034