Add timeout injection to faulty agent for testing (#37485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37485
Adds arbitrary timeout injection to faulty RPC agent. This is to better test scenarios that need information about how long-running RPCs, such as properly testing RPC timeouts and the profiler in all scenarios.
This is done by overriding ProcessGroupAgent's `enqueueSend()` function to inject the timeout. Determining which messages to timeout is done similar to the existing `faulty_messages` by having the user specify a mapping of message to timeout.
Added unit tests that verify RPC timeouts work with builtin + TorchScript functions, which was not tested before.
ghstack-source-id: 103341662
Test Plan: Added unit tests in `FaultyRpcAgentTest`.
Differential Revision: D21296537
fbshipit-source-id: 1dbc21aee14e49780272634e9cbb2b5a448f2896