[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device (#49711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49711
`torch.cuda.synchronize` uses the current device by default. Explicitly specify this device for better readability.
Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 119017654
Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook
Reviewed By: rohan-varma
Differential Revision: D25672267
fbshipit-source-id: 62a2266727a2ea76175f3c438daf20951091c771