Improve Error Message for Dist Autograd Context Cleanup Failure (#37255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37255
Improved error message logged when Distributed Autograd Context cleanup fails - added node information and underlying error. The previous error message also assumed that the cause of the error was due to too many RPC's failing, but this is not necessarily the case.
ghstack-source-id: 102867620
Test Plan: Ensuring Sandcastle/CI tests pass. Verified the correct message is logged when this code path is executed in `test_backward_node_failure` and `test_backward_node_failure_python_udf` .
Differential Revision: D20950664
fbshipit-source-id: 267318187b7ef386930753c9679a5dfab6d87018