SemanticDiff

pytorch
696e30af - Fix ProcessGroupNCCL profiling when profiler is not run with use_cuda (#48946)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

3 years ago

Fix ProcessGroupNCCL profiling when profiler is not run with use_cuda (#48946) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48946 Move recordFunctionEndCallback to after the blocking portion of launching the NCCL kernel, and remove addCallback since it runs the lambda inline anyways, and triggers unnecessary CUDA stream logic. If we want CUDA operations such as NCCL kernels accurately profiled, we should use the profiler with use_cuda=True. However, we are currently debugging a deadlock for the use_cuda=True case, fix is being tracked in #48987. To ensure that the tests are no longer flaky, submitted this PR to ci-all: #48947 and ran the test a bunch of times ssh'd into the CI machine. ghstack-source-id: 118330130 Test Plan: Ci Reviewed By: mrzzd Differential Revision: D25368322 fbshipit-source-id: 7d17036248a3dcd855e58addc383bba64d6bc391

Author

rohan-varma

rohan-varma

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading