Release GIL during DDP construction. (#40495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40495
As part of debugging flaky ddp_under_dist_autograd tests, I realized
we were running into the following deadlock.
1) Rank 0 would go into DDP construction, hold GIL and wait for broadcast in
DDP construction.
2) Rank 3 is a little slower and performs an RRef fetch call before the DDP
construction.
3) The RRef fetch call is done on Rank 0 and tries to acquire GIL.
4) We now have a deadlock since Rank 0 is waiting for Rank 3 to enter the
collective and Rank 3 is waiting for Rank 0 to release GIL.
ghstack-source-id: 106534442
Test Plan:
1) Ran ddp_under_dist_autograd 500 times.
2) waitforbuildbot
Differential Revision: D22205180
fbshipit-source-id: 6afd55342e801b9edb9591ff25158a244a8ea66a