Use side-stream in CPU to GPU copies in DDP (#50180) (#52270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50180
Resolves the regression in
https://github.com/pytorch/pytorch/issues/49819 by adding copy over background
stream similar to scatter. For internal use cases, this is gated with an env var that maintains the previous behavior when it is off.
Test Plan: CI
Reviewed By: mrshenli, ngimel
Differential Revision: D25818170
fbshipit-source-id: e50c76c035504b2a44e2be084701cee45c90df75