Fixed error Regex Parsing for Node Failure Tests (#36620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36620
Sending to a node that has been shutdown in ProcessGroupAgent could throw several possible exceptions. This PR updates the tests to check for the right exceptions while waiting for other nodes in the gang to fail in `test_backward_node_failure` and `test_backward_node_failure_python_udf`.
ghstack-source-id: 102153944
Test Plan: Stress-tested `test_backward_node_failure` and `test_backward_node_failure_python_udf`. They were previously completely broken, but this change makes `test_backward_node_failure` functional and `test_backward_node_failure_python_udf` is flaky but fails infrequently. A change to make the last test work reliably is planned.
Differential Revision: D21027280
fbshipit-source-id: e85c2d219ee408483442bd9925fff7206c8efe4b