Remove lock from GraphTask::set_exception_without_signal. (#45867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45867
In most cases the lock ordering was hold a lock in local autograd and
then hold a lock in DistAutogradContext.
In case of `set_exception_without_signal` the lock order was in reverse and as
a result we saw potential deadlock issues in our TSAN tests. To fix this, I
removed the lock and instead just used std::atomic exchange.
In addition to this, I fixed TestE2E to ensure that we use the appropriate
timeout.
TestE2EProcessGroup was flaky for these two reasons and now is fixed.
ghstack-source-id: 113592709
Test Plan: waitforbuildbot.
Reviewed By: albanD
Differential Revision: D24120962
fbshipit-source-id: 12447b84ceae772b91e9a183c90d1e6340f44e66