pytorch
acd914f0 - Fix Pipe + DDP for unused parameters, static graph (#60118)

Commit View On GitHub

Commit

3 years ago

Fix Pipe + DDP for unused parameters, static graph (#60118) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60118 Pipe + DDP has a few issues: 1) with static graph, does not synchronize gradients on first backward pass (i.e. delay allreduce is not run). does not work since https://github.com/pytorch/pytorch/pull/55248 2) when find_unused_parameters=True, also does not results in gradient synchronization. does not work since https://github.com/pytorch/pytorch/pull/57081 The reason for both cases is that calling `DDPSink.apply(output_tensor)` does not call the custom `backward` of `DDPSink` when the `output_tensor` is actually an `OwnerRRef`, which is the case when running DDP in `Pipe`. This is because we do `backward` on the `rref.local_value()` which does not have this autograd recording. To fix, we unwrap the RRef and reconstruct it as needed, similar to the fix in https://github.com/pytorch/pytorch/pull/49908. to test: All tests in pipe_with_ddp_test pass. The reason these tests did not catch the errors earlier is because all ranks received the same model inputs. So if gradient synchronization did not occur, then grads would still be the same because the model is the same on all ranks (guaranteed by ddp). Fixed the tests to use different inputs across ranks. ghstack-source-id: 131688187 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D29167283 fbshipit-source-id: fe62310db2dc6de8519eb361b1df8ae4dfce3ab8

Author

rohan-varma

Committer

facebook-github-bot

Parents

2062cafa

pytorch acd914f0 - Fix Pipe + DDP for unused parameters, static graph (#60118)

Commit

pytorch
acd914f0 - Fix Pipe + DDP for unused parameters, static graph (#60118)