pytorch
527ab134 - [NCCL] Explicitly Abort NCCL Communicators on Process Group Destruction (#40241)

Commit

4 years ago

[NCCL] Explicitly Abort NCCL Communicators on Process Group Destruction (#40241) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40241 We abort incomplete NCCL Communicators in the ProcessGroupNCCL destructor, otherwise pending NCCL communciators may block other CUDA ops. Closes: https://github.com/pytorch/pytorch/issues/32231 ghstack-source-id: 106469423 Test Plan: CI/Sandcastle Reviewed By: jiayisuse Differential Revision: D22103662 fbshipit-source-id: 1f6f88b56bd7a5e9ca5a41698995a76e60e8ad9f

Author

osalpekar

Committer

facebook-github-bot

Parents

fe18dcd6

pytorch 527ab134 - [NCCL] Explicitly Abort NCCL Communicators on Process Group Destruction (#40241)

pytorch
527ab134 - [NCCL] Explicitly Abort NCCL Communicators on Process Group Destruction (#40241)