[NCCL] Add Error log when ProcessGroupNCCL takes down process upon (#44988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44988
The new NCCL async error handling feature throws an exception from the
workCleanup Thread if one of the NCCL operations encounters an error or times
out. This PR adds an error log to make it more clear to the user why the
training process crashed.
ghstack-source-id: 114002493
Test Plan:
Verified that we see this error message when running with the desync
test.
Reviewed By: pritamdamania87
Differential Revision: D23794801
fbshipit-source-id: 16a44ce51f01531062167fb762a8553221363698