Update on "[DDP] Support for multiple backwards"
Move prepare_for_backward into _DDPSink backward instead of calling it in DDP forward pass so that we can run multiple backwards in DDP with retain_graph=True.
Tests are added for DDP regular training (non-static graph, non find unused parameters) non-static graph with unused parameters, and static graph without/without find unused parameters.
Also includes a fix for static graph training that is described in https://github.com/pytorch/pytorch/issues/58111.
Differential Revision: [D28855226](https://our.internmc.facebook.com/intern/diff/D28855226/)
[ghstack-poisoned]