Add unit test to ensure no gradients sync when calling ddp.module(input) (#20282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20282
Add unit test to ensure no gradients sync when calling ddp.module(input), e.g not invoking prepare_for_backward
PyText is depending on DDP for data parallel distributed training. To support accumulate gradients locally before gradients sync, we are calling orig_model.forward instead of ddp_model.forward. Add a unit test to avoid changes break the assumption.
Reviewed By: pietern, mrshenli
Differential Revision: D15263155
fbshipit-source-id: 7734e174f507690fb23ea6c52dffff4a93f9b151