pytorch
752f433a - DDP communication hook: skip dividing grads by world_size if hook registered. (#42400)

Commit View On GitHub

Commit

4 years ago

DDP communication hook: skip dividing grads by world_size if hook registered. (#42400) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42400 mcarilli spotted that in the original DDP communication hook design described in [39272](https://github.com/pytorch/pytorch/issues/39272), the hooks receive grads that are already predivided by world size. It makes sense to skip the divide completely if hook registered. The hook is meant for the user to completely override DDP communication. For example, if the user would like to implement something like GossipGrad, always dividing by the world_size would not be a good idea. We also included a warning in the register_comm_hook API as: > GradBucket bucket's tensors will not be predivided by world_size. User is responsible to divide by the world_size in case of operations like allreduce. ghstack-source-id: 109548696 **Update:** We discovered and fixed a bug with the sparse tensors case. See new unit test called `test_ddp_comm_hook_sparse_gradients` and changes in `reducer.cpp`. Test Plan: python test/distributed/test_c10d.py and perf benchmark tests. Reviewed By: ezyang Differential Revision: D22883905 fbshipit-source-id: 3277323fe9bd7eb6e638b7ef0535cab1fc72f89e

Author

sinannasir

Committer

facebook-github-bot

Parents

d7aaa332

pytorch 752f433a - DDP communication hook: skip dividing grads by world_size if hook registered. (#42400)

Commit

pytorch
752f433a - DDP communication hook: skip dividing grads by world_size if hook registered. (#42400)