pytorch
e9c3ce30 - Fix flaky test_barrier_timeout_global. (#57523)

Commit
4 years ago
Fix flaky test_barrier_timeout_global. (#57523) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57523 `_test_barrier_timeout` would run a barrier on rank 1 and sleep for `timeout` on other ranks. In some cases if the other ranks would be faster, they would enter the sleep call much earlier than rank 0 would enter barrier. As a result, they would exit before the timeout is up and rank 0 would receive a connection closed error instead of a timeout error. This would result in the barrier call exiting before the timeout and the subsequent assertion failing. #Closes: https://github.com/pytorch/pytorch/issues/57176 ghstack-source-id: 128278775 Test Plan: 1) waitforbuildbot 2) Tested synthetically by forcing a rank to exit earlier. Reviewed By: rohan-varma Differential Revision: D28170821 fbshipit-source-id: a67456a1784dd0657f264c4f5498638e0aa00de2
Author
Parents
Loading