pytorch
7a404121 - Delay reduction of unused parameters until first autograd hook is called (#22219)

Commit View On GitHub

Commit

5 years ago

Delay reduction of unused parameters until first autograd hook is called (#22219) Summary: Reduction of gradients for unused parameters should happen as soon as possible, because they potentially block reduction of gradients for used parameters. This used to happen instantly when `prepare_for_backward` was called and it found parameters that didn't contribute. This meant that if you have a model with unused parameters, and you want to discard the model output (i.e. not call backward on some loss), reduction of the gradients of those unused parameters would have been kicked off, and you'd see an error the next time you called `forward`. In this commit, this original approach is slightly changed to delay reduction of the gradients of those unused parameters until the first autograd hook is called. This means that you can now discard the model output regardless of the model having unused parameters or not. This is a prerequisite for making the `find_unused_parameters` argument to DDP default to `True`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22219 Differential Revision: D16028698 Pulled By: pietern fbshipit-source-id: c6aec2cd39c4a77746495d9cb1c9fb9c5ac61983

Author

pietern

Committer

facebook-github-bot

Parents

ac398693

pytorch 7a404121 - Delay reduction of unused parameters until first autograd hook is called (#22219)

Commit

pytorch
7a404121 - Delay reduction of unused parameters until first autograd hook is called (#22219)