pytorch
6157f8ae - Use fresh stream from pool for each FutureNCCL callback (#48498)

Commit View On GitHub

Commit

3 years ago

Use fresh stream from pool for each FutureNCCL callback (#48498) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48498 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- FutureNCCL has a dedicated CUDA stream that it sets as current when running callbacks. This stream is initialized by the ProcessGroupNCCL by extracting it from the global ATen pool. In order to decouple FutureNCCL from that specific ProcessGroup and make it more generic, in this commit we make FutureNCCL extract a fresh stream from the ATen pool each time it needs one. This introduces a functional change, because it removes the implicit synchronization and ordering between the callbacks of a same Future. In fact, such an ordering is hard to guarantee in the general case as, for example, a user could attach a new callback just after the future becomes completed, and thus that callback would be run inline, immediately, out-of-order wrt the other callbacks. (There are ways to "fix" this but they are complicated). NCCL got around this because its futures are already marked complete when they're returned, but in fact it could also run into issues if multiple threads were adding callbacks simultaneously. Note that it remains still possible to enforce ordering between callbacks, but one must now do so explicitly. Namely, instead of this: ``` fut.then(cb1) fut.then(cb2) ``` one must now do: ``` fut.then(cb1).then(cb2) ``` ghstack-source-id: 118180029 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25177559 fbshipit-source-id: 4d4e73ea7bda0ea65066548109b9ea6d5b465599

Author

Committer

facebook-github-bot

Parents

8fb52e7f

pytorch 6157f8ae - Use fresh stream from pool for each FutureNCCL callback (#48498)

Commit

pytorch
6157f8ae - Use fresh stream from pool for each FutureNCCL callback (#48498)