Make DistributedDataParallel use new reducer (#18953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18953
This removes Python side bucketing code from DistributedDataParallel
and replaces it with calls to the new C++ based bucketing and reducing
code. To confirm this is working well, we ran a test with both the
previous implementation and the new implementation, and confirmed they
are numerically equivalent.
Performance is improved by a couple percent or more, including the
single machine multiple GPU runs.
Closes #13273.
Reviewed By: mrshenli
Differential Revision: D14580911
fbshipit-source-id: 44e76f8b0b7e58dd6c91644e3df4660ca2ee4ae2