modify test_local_shutdown_with_rpc to not be flaky (#30837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30837
This test would get very occasional flakes, with an error saying the
RPC timed out. This happened because one worker would still be waiting for the
return value of an RPC, but another worker had already performed its local
shutdown, so it would not have sent the response. This didn't show up in
initial testing since the flakiness is very rare (< 1/100 test runs). This diff
fixes the issue by not erroring if these RPCs timeout. The reason this is okay
is because with a local shutdown, we should not expect for all outstanding RPCs
to be completed, since workers are free to shut down without completing/waiting
on outstanding work.
ghstack-source-id: 95021672
ghstack-source-id: 95021672
Test Plan: Ran the test 1000 times to ensure that it is not flaky.
Differential Revision: D18775731
fbshipit-source-id: 21074e8b4b4bbab2be7b0a59e80cb31bb471ea46