[rpc] allow ability to abort second call to RecvWork::wait() in ProcessGroupAgent::listenLoop (#36084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36084
https://github.com/pytorch/pytorch/pull/30330 added support to abort the call to a `RecvWork` created by `recvAnysource`, but there is an additional call to `pg_->recv()` to actually get the tensor sent over the wire (the previous call is the preamble for the tensor). This adds support to be able to abort this call as well in `::shutdown()`, which can be used to avoid hangs during ungraceful shutdown.
Added an internal test case in `ProcessGroupAgentTest` to ensure that an appropriate error message is raised when this happens.
ghstack-source-id: 101689402
Test Plan:
Added test in ProcessGroupAgentTest. We also add a basic config that allows us to control whether to abort the call to `pg->recv()` and `pg->recvAnysource()` in `FailingWaitProcessGroupGloo`.
Run test binary:
```buck build mode/dev-nosan //caffe2/torch/fb/distributed/thriftRpcBackend/test:ProcessGroupAgentTest --keep-going
~/fbcode/buck-out/gen/caffe2/torch/fb/distributed/thriftRpcBackend/test/ProcessGroupAgentTest
```
P128567144
Differential Revision: D20632764
fbshipit-source-id: c0b3c391fd3e0ae711661ad99f309ee4d93f6582