Fix flaky TestTrainingLoop - TestE2ETensorPipe (#51939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51939
TestTrainingLoop - TestE2ETensorPipe was flaky since there would still
be inflight background RPCs running as we performed the assertions. This
resulted in these assertions failing since we didn't wait for all RPCs on the
agent to finish.
To resolve this issue, in this PR we join() and shutdown() the RPC agent to
ensure no further RPCs are done. Then we assertion the map sizes to ensure no
leaks occurred.
In addition to this, added messageIdToTimeout map to lookup the appropriate
timeout for a messageId. This ensures we remove the appropriate entry from the
map. The previous solution was passing the expirationTime through the lambda,
but it is not guaranteed the lambda would read the response of the request we
just sent out.
ghstack-source-id: 121412604
Test Plan:
1) unit tests
2) waitforbuildbot
Reviewed By: rohan-varma
Differential Revision: D26331585
fbshipit-source-id: a41e0534d7d4dfd240446e661e5541311931c7d7