pytorch
f25cdf8a - Revert "Rewrite NCCL watchdog to more reliably throw timeout (#97066)"

Commit
1 year ago
Revert "Rewrite NCCL watchdog to more reliably throw timeout (#97066)" This reverts commit 95e8d0c39ec523f5a35c31155285fd4242928d8a. Reverted https://github.com/pytorch/pytorch/pull/97066 on behalf of https://github.com/clee2000 due to sorry but I think this broke periodic mutigpu tests https://hud.pytorch.org/pytorch/pytorch/commit/416bac5b813a181753afade781ae30f4f0843586 https://github.com/pytorch/pytorch/actions/runs/4505085943/jobs/7930826040
Parents
Loading