pytorch
d6394183 - Add timeout injection to faulty agent for testing (#37485)

Commit
4 years ago
Add timeout injection to faulty agent for testing (#37485) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37485 Adds arbitrary timeout injection to faulty RPC agent. This is to better test scenarios that need information about how long-running RPCs, such as properly testing RPC timeouts and the profiler in all scenarios. This is done by overriding ProcessGroupAgent's `enqueueSend()` function to inject the timeout. Determining which messages to timeout is done similar to the existing `faulty_messages` by having the user specify a mapping of message to timeout. Added unit tests that verify RPC timeouts work with builtin + TorchScript functions, which was not tested before. ghstack-source-id: 103341662 Test Plan: Added unit tests in `FaultyRpcAgentTest`. Differential Revision: D21296537 fbshipit-source-id: 1dbc21aee14e49780272634e9cbb2b5a448f2896
Author
Parents
Loading