SemanticDiff

pytorch
365de7bd - Support sparse gradients in DistributedDataParallel (#19443)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

5 years ago

Support sparse gradients in DistributedDataParallel (#19443) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19443 This adds support for sparse gradients to the reducer as well as to the DistributedDataParallel wrapper. Note that an out of band signal is needed whether or not a dense parameter (e.g. an embedding) is expected to receive a sparse gradient or not. This information is passed to the bucket assignment computation routine and the reducer as a vector of booleans. Every parameter for which we expect a sparse gradient is assigned its own bucket, as we cannot easily group multiple unrelated sparse tensors. Reviewed By: mrshenli Differential Revision: D15007365 fbshipit-source-id: f298e83fd3ca828fae9e80739e1db89d045c99ac

Author

pietern

pietern

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading