SemanticDiff

pytorch
2d2370bb - [Dist profiling] Fix ProcessGroupNCCL collective profiling (#55204)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

3 years ago

[Dist profiling] Fix ProcessGroupNCCL collective profiling (#55204) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55204 Implements a fix discussed offline with pritamdamia87 to run end callbacks after `CUDAFuture`'s wrapCallback has ensured appropriate synchronization. Also enables the relevant distributed profiling tests that were previously disabled for ProcessGroupNCCL. Note that the profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with ilia-cher. However, this PR improves the usability of torch.autograd.profiler with respect to distributed collectives. ghstack-source-id: 127357995 Test Plan: CI Reviewed By: mrshenli Differential Revision: D27491711 fbshipit-source-id: cec7703a4c5d59b5023b0aa8fef4c2e3fb8d37d0

Author

rohan-varma

rohan-varma

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading