Fix CUDA stream syncing bug in allgather and reduce_scatter (#19631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19631
ghimport-source-id: edc47e77d6ef03e966944ff98eefc22f2574eeaa
Reviewed By: mrshenli
Differential Revision: D15110077
Pulled By: mxw
fbshipit-source-id: 27a68308ade5ea511e2ea568a071eedb5d21c1ba