pytorch
e8d2916b - Add faulty tensorpipe implementation (#61421)

Commit

3 years ago

Add faulty tensorpipe implementation (#61421) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61421 This PR adds the faulty tensorpipe agent implementation and replaces all faulty process group agent tests with it. The faulty tensorpipe agent code is very similar to that of faulty process group agent. It allows the user to fail or delay certain types of rpc messages, which is used in the faulty agent tests. These changes are needed to deprecate the process group rpc backend. Summary of changes: - Add faulty tensorpipe agent class - Update tensorpipe pipeWrite function to allow to be overwritten and add delay - Update test backend registry and faulty agent tests to use the FAULTY_TENSORPIPE_AGENT backend. This effects all faulty agent tests, here a few of them as sample commands: `pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_verify_backend_options` `pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_no_faulty_messages` `pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_builtin_remote_message_dropped_timeout` Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29773739 Pulled By: H-Huang fbshipit-source-id: 6b2bc366735d70b79943d4207f454bc9555bbf5f

Author

H-Huang

Committer

facebook-github-bot

Parents

d856914c

pytorch e8d2916b - Add faulty tensorpipe implementation (#61421)

pytorch
e8d2916b - Add faulty tensorpipe implementation (#61421)