pytorch
1149ba55 - Revert "[NCCL] Add experimental Nonblocking NCCL Fault Tolerance/Checking (#95715)"

Commit
1 year ago
Revert "[NCCL] Add experimental Nonblocking NCCL Fault Tolerance/Checking (#95715)" This reverts commit a33eac398881cfa9aad679ceffd28ace3fa44f01. Reverted https://github.com/pytorch/pytorch/pull/95715 on behalf of https://github.com/PaliC due to This pr has caused a regression on distributed/test_dynamo_distributed.py::TestMultiProc::test_ddp_baseline_aot_eager_multiprocess causing it to timeout (https://hud.pytorch.org/failure/distributed%2Ftest_dynamo_distributed.py%3A%3ATestMultiProc%3A%3Atest_ddp_baseline_aot_eager_multiprocess)
Parents
Loading