pytorch
7f7f0fa3 - Avoid using FutureNCCL before it's ready (#48561)

Commit View On GitHub

Commit

3 years ago

Avoid using FutureNCCL before it's ready (#48561) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48561 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- WorkNCCL allows to extract a FutureNCCL through getFuture(). There is one instance of this method being called by ProcessGroupNCCL itself, in order to attach a callback to it. This was happening _before_ the work was actually launched, however FutureNCCL does _always_ invoke its callbacks immediately inline. The events that the FutureNCCL was using hadn't been recorded yet, thus blocking on them was a no-op. Moreover, the function that was being called was installed by the generic ProcessGroup superclass, which is not CUDA-aware, and thus probably didn't make any use of the CUDA events or streams. https://github.com/pytorch/pytorch/blob/383abf1f0c1f74e0f471d47e505895d1b0e6bb20/torch/lib/c10d/ProcessGroup.cpp#L66 In short: I believe that creating a FutureNCCL and attaching a callback was equivalent to just invoking that function directly, without any CUDA-specific thing. I'm thus converting the code to do just that, in order to simplify it. Note that, given the comment, I don't think this was the original intention of that code. It seems that the function was intended to be run once the work finished. However, I am not familiar with this code, and I don't want to introduce any functional changes. ghstack-source-id: 118180037 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25210337 fbshipit-source-id: 54033c814ac77641cbbe79b4d01686dfc2b45495

Author

Committer

facebook-github-bot

Parents

eb9516ea

pytorch 7f7f0fa3 - Avoid using FutureNCCL before it's ready (#48561)

Commit

pytorch
7f7f0fa3 - Avoid using FutureNCCL before it's ready (#48561)