pytorch
ddb65949 - [Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q (#48507)

Commit View On GitHub

Commit

3 years ago

[Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q (#48507) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48507 Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step. Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps. 'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675 Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117483639 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view Also checked the initial Qs and input random seeds of torch.manual_seed() of different ranks for a few steps in real runs. Example logs: Exactly same random seed of different ranks at the same step on two nodes, and the random seed varies at each step. {F346971916} Reviewed By: rohan-varma Differential Revision: D25191589 fbshipit-source-id: f7f17df3ad2075ecae1a2a56ca082160f7c5fcfc

Author

Yi Wang

Committer

facebook-github-bot

Parents

61936cb1

pytorch ddb65949 - [Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q (#48507)

Commit

pytorch
ddb65949 - [Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q (#48507)