pytorch
d850b5c9 - Fix DDP issue where parameters share same grad_accumulator (#46755)

Commit

4 years ago

Fix DDP issue where parameters share same grad_accumulator (#46755) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46755 As reported in https://github.com/pytorch/pytorch/issues/41324, there is a bug in DDP when `find_unused_parameters=True` and 2 or more parameters share the same gradient accumulator. In the reducer, we currently keep a mapping of grad accumulator to index and populate it with map[accumulator] = index, but this overwrites indices when the accumulator is the same. To fix this, switch the mapping values to a vector of indices to hold all such indices that share the same accumulator. ghstack-source-id: 115453567 Test Plan: Added UT Reviewed By: pritamdamania87 Differential Revision: D24497388 fbshipit-source-id: d32dfa9c5cd0b7a8df13c7873d5d28917b766640

Author

rohan-varma

Committer

facebook-github-bot

Parents

68057153

pytorch d850b5c9 - Fix DDP issue where parameters share same grad_accumulator (#46755)

pytorch
d850b5c9 - Fix DDP issue where parameters share same grad_accumulator (#46755)