pytorch
ea06db94 - Release GIL during DDP construction. (#40495)

Commit
4 years ago
Release GIL during DDP construction. (#40495) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40495 As part of debugging flaky ddp_under_dist_autograd tests, I realized we were running into the following deadlock. 1) Rank 0 would go into DDP construction, hold GIL and wait for broadcast in DDP construction. 2) Rank 3 is a little slower and performs an RRef fetch call before the DDP construction. 3) The RRef fetch call is done on Rank 0 and tries to acquire GIL. 4) We now have a deadlock since Rank 0 is waiting for Rank 3 to enter the collective and Rank 3 is waiting for Rank 0 to release GIL. ghstack-source-id: 106534442 Test Plan: 1) Ran ddp_under_dist_autograd 500 times. 2) waitforbuildbot Differential Revision: D22205180 fbshipit-source-id: 6afd55342e801b9edb9591ff25158a244a8ea66a
Author
Parents
Loading