[Gradient Compression] Let the dtype of created low-rank tensors P and Q be the same type as the input tensor (#48902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48902
Previously if the dtype of input gradients is FP16, matrix multiplications will fail, because the created low-rank tensors P and Q use FP32 dtype.
Now let the dtype of P and Q be the same as the input tensor.
Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 117962078
Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
Reviewed By: rohan-varma
Differential Revision: D25362071
fbshipit-source-id: e68753ff23bb480605b02891e128202ed0f8a587