[DDP] Support optim in backward after DDP init (#105995)
This allows in backward optimizers to be configured after DDP init, in
addition to before as was previously supported.
Differential Revision: [D47783347](https://our.internmc.facebook.com/intern/diff/D47783347/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105995
Approved by: https://github.com/fegin