pytorch
1eadcac1 - [Reland][DDP] Support not all outputs used in loss calculation (#61753)

Commit

3 years ago

[Reland][DDP] Support not all outputs used in loss calculation (#61753) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61753 Reland of https://github.com/pytorch/pytorch/pull/57081. Main difference is that the former diff moved `prepare_for_backward` check into `DDPSink` backward, but that resulted in issues due to potential autograd engine races. The original diff moved `prepare_for_backward` into `DDPSink` as part of a long-term plan to always call it within `DDPSink`. In particular this doesn't work because `prepare_for_backward` sets `expect_autograd_hooks=true` which enables autograd hooks to fire, but there were several use cases internally where autograd hooks were called before DDPSink called `prepare_for_backward`, resulting in errors/regression. We instead keep the call to `prepare_for_backward` in the forward pass, but still run outputs through `DDPSink` when find_unused_parameters=True. As a result, outputs that are not used when computing loss have `None` gradients and we don't touch them if they are globally `None`. Note that the hooks still fire with a undefined gradient which is how we avoid the Reducer erroring out with the message that some hooks did not fire. Added the unittests that were part of the reverted diff. ghstack-source-id: 135388925 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29726179 fbshipit-source-id: 54c8819e0aa72c61554104723a5b9c936501e719

References

#63371 - [LTC] Merge master into lazy_tensor_staging

Author

rohan-varma

Committer

alanwaketan

Parents

743548dc

pytorch 1eadcac1 - [Reland][DDP] Support not all outputs used in loss calculation (#61753)

pytorch
1eadcac1 - [Reland][DDP] Support not all outputs used in loss calculation (#61753)