pytorch
b91b0872 - Record CUDA events for "follow-up" FutureNCCL inside markCompleted (#48499)

Commit View On GitHub

Commit

3 years ago

Record CUDA events for "follow-up" FutureNCCL inside markCompleted (#48499) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48499 This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- We can merge and "hide" a whole bunch of CUDA-related logic if we store and record the CUDA events that correspond to the completion of a FutureNCCL when we call markCompleted (rather than splitting it between the constructor, the `then` method, and a wrapper around the callback). A more concrete reason for this change is that soon I'll add support for multi-device, and in that case we can't necessarily know in advance which devices a value will be on until we get that value (and we don't want to record an event on all devices as then we might "over-synchronize"). To me, this also makes more conceptual sense: the moment when we store a value on the future, which is the "signal" that the future is now ready, should also be time at which we record the events needed to synchronize with that value. Though this may just be personal preference. ghstack-source-id: 118180034 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D25177557 fbshipit-source-id: 53d4bcdfb89fa0d11bb7b1b94db5d652edeb3b7b

Author

Committer

facebook-github-bot

Parents

6157f8ae

pytorch b91b0872 - Record CUDA events for "follow-up" FutureNCCL inside markCompleted (#48499)

Commit

pytorch
b91b0872 - Record CUDA events for "follow-up" FutureNCCL inside markCompleted (#48499)