pytorch
e294c2d8 - Add multi-GPU support to FutureNCCL (#48500)

Commit View On GitHub

Commit

3 years ago

Add multi-GPU support to FutureNCCL (#48500) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48500 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- After the previous changes, this is now much simpler than it sounds. For the most part it just consists in repeating some operations multiple times, once for device (e.g., recording and blocking on events). Funnily, we already had a vector of events, even though we only ever stored one element in it (this probably comes from the fact that this is shared with WorkNCCL, which can hold more than one event). Here, we now also store a vector of device indices. Perhaps the only non-trivial part of this is that now, for "follow-up" Futures (for callbacks), we can't know in advance which device the result will be on so we must determine it dynamically when we receive the result, by inspecting it. That's also easier than it sound because we already have a dataptr extractor. ghstack-source-id: 118180022 Test Plan: Unit tests (I should probably add new ones) Reviewed By: mrshenli Differential Revision: D25177556 fbshipit-source-id: 41ef39ec0dc458e341aa1564f2b9f2b573d7fa9f

Author

Committer

facebook-github-bot

Parents

91ad3ed8

pytorch e294c2d8 - Add multi-GPU support to FutureNCCL (#48500)

Commit

pytorch
e294c2d8 - Add multi-GPU support to FutureNCCL (#48500)