Fix CUDA RPC Stream Synchronization (#50949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50949
When converting RPC Message into Python objects, we were not using
a CUDAFuture for the chained Future. As a result, the streams are
not synchronized when calling `rpc_async(...).wait()`. This commit
uses `Future::then` API to create the chained Future, which will
be creating a CUDAFuture if the existing Future is a CUDA one.
fixes #50881
fixes #50839
Test Plan: Imported from OSS
Reviewed By: pritamdamania87
Differential Revision: D26020458
Pulled By: mrshenli
fbshipit-source-id: 25195fbc10b99f4c401ec3ed7a382128464b5f08