[resubmit] Providing more information while crashing process in async error handling (#47246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47246
We crash the process in NCCL Async Error Handling if the collective
has been running for greater than some set timeout. This PR introduces more
information about the rank and duration the collective ran.
ghstack-source-id: 116676182
Test Plan: Run desync tests and flow.
Reviewed By: pritamdamania87
Differential Revision: D24695126
fbshipit-source-id: 61ae46477065a1a451dc46fb29c3ac0073ca531b