pytorch
dede0bb0 - [NCCL] Use OptionalCUDAGuard in ProcessGroupNCCL::WorkNCCL::synchronizeInternal (#98895)

Commit
2 years ago
[NCCL] Use OptionalCUDAGuard in ProcessGroupNCCL::WorkNCCL::synchronizeInternal (#98895) Using `CUDAGuard` does redundant `set_device` in the following loop: ```C++ { for (auto& device : devices_) { at::cuda::CUDAGuard gpuGuard(device); // set device // ... // ~gpuGuard() sets original device } // ... } ``` It would be more efficient to use `OptionalCUDAGuard` as follows: ```C++ { at::cuda::OptionalCUDAGuard gpuGuard; for (auto& device : devices_) { gpuGuard.set_index(device.index()); // set device // ... } // ... // ~gpuGuard() sets original device } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/98895 Approved by: https://github.com/mrshenli
Author
Committer
Parents
Loading