[PyTorch Distributed] Add debug hint for NCCL async system error (#73897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73897
add a debug hint that async system error can be caused by unexpected exit of
a remote process if not an actual network issue. For example, the exit of the remote process
can cause a closed network connection error at a local process. The hint helps to direct
the debug focus to the remote process.
Test Plan: unit tests
Reviewed By: pritamdamania87, rohan-varma
Differential Revision: D34702348
fbshipit-source-id: d19f9116e9efe5f6d76c0158a7a447616437ca69
(cherry picked from commit 005e74b7b6764ecd832b3410063285bff2411b56)