Revert D26955317: Perform appropriate CUDA stream synchronization in distributed autograd.
Test Plan: revert-hammer
Differential Revision:
D26955317 (https://github.com/pytorch/pytorch/commit/0b84f45f03fc75f85ee3bbd3924689cd261ce9bd)
Original commit changeset: eace6d4f91d4
fbshipit-source-id: 1f322b4d7cf7d1a7e6caf3194c6f0bf163d45850