pytorch
d1088de5 - Let RRef getValue() synchronize CUDA streams (#56895)

Commit
4 years ago
Let RRef getValue() synchronize CUDA streams (#56895) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56895 PR #54932 fixes CUDA stream synchronization between RPC-created OwnerRRef and UserRRef when `to_here()` is invoked. However, there are two more gaps. 1. RRef value can be accessed on the owner directly through `local_value`, which bypasses the fix in #54932. 2. When RRef is created directly through RRef ctor instead of RPC, the OwnerRRef won't be able to correctly record CUDA events. This PR fixes 1 by letting current streams wait for RRef recorded CUDA events before returning the value in `RRef::getValue()`. For 2, more discussions is needed to decide whether we should add a `devices` argument to RRef ctor, or should RRef ctor inspect the given values. Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D27992775 Pulled By: mrshenli fbshipit-source-id: ed0e5bfbf715460208c85e46dd3317deef17f8fe
Author
Parents
Loading