SemanticDiff

pytorch
bdfdf6c2 - C++ handler for gradient reduction (#18251)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

5 years ago

C++ handler for gradient reduction (#18251) Summary: This commit adds the `c10d::Reducer` class that hooks into autograd and performs gradient bucketing and reduction. These are the core parts of `nn.parallel.DistributedDataParallel` that up to now were only usable for CUDA models. This should enable the following: * Distributed data parallelism for models defined using the C++ frontend. * Allow overlap of gradient computation and reduction for non-CUDA models. * Enable distributed data parallelism for models with some unused parameters. This does not include any logic for computing bucket assignment, which can be done separately; either by observing autograd execution order (this is what Apex does), or by assigning buckets based on some maximum byte size, or both. Also see #17757 and #13273. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18251 Reviewed By: mrshenli Differential Revision: D14571899 Pulled By: pietern fbshipit-source-id: 20f95eefd288dfe8cfffe0a28ca22fa7c9c3cd4c

Author

pietern

pietern

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading