catch exceptions in ProcessGroupAgent::enqueueSend and report them. (#31023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31023
Adds support to catch exceptions in ProcessGroupAgent::enqueueSend and
report them in the future by marking the future as completed with an exception
indicating the error. An example of when this could happen is if the receiving
side aborts when the sender is sending the message, previously, we would hang
until the timeout is hit, and the original exception would be lost.
ghstack-source-id: 96498386
Test Plan: Added a relevant unit test: `test_sender_exceptions` in rpc_test.py
Differential Revision: D18901981
fbshipit-source-id: 08de26936c4ad45b837219a247088cbea644c04c