pytorch
fd41ed1c - Fix flaky TestTrainingLoop - TestE2ETensorPipe (#51939)

Commit
3 years ago
Fix flaky TestTrainingLoop - TestE2ETensorPipe (#51939) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51939 TestTrainingLoop - TestE2ETensorPipe was flaky since there would still be inflight background RPCs running as we performed the assertions. This resulted in these assertions failing since we didn't wait for all RPCs on the agent to finish. To resolve this issue, in this PR we join() and shutdown() the RPC agent to ensure no further RPCs are done. Then we assertion the map sizes to ensure no leaks occurred. In addition to this, added messageIdToTimeout map to lookup the appropriate timeout for a messageId. This ensures we remove the appropriate entry from the map. The previous solution was passing the expirationTime through the lambda, but it is not guaranteed the lambda would read the response of the request we just sent out. ghstack-source-id: 121412604 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26331585 fbshipit-source-id: a41e0534d7d4dfd240446e661e5541311931c7d7
Author
Parents
Loading