Synchronize RRef.to_here() CUDA Streams properly (#54932)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54932
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D27684022
Pulled By: pbelevich
fbshipit-source-id: 2bae51ab6649258d0219ca4e9dbbf45ac6a76c28