[NCCL] Enhance watchdog to log exceptions (#54557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54557
When looping through the nccl communicator cache checking for errors, enhance the watchdog to log exceptions that are set on the communicator.
This will allow for better debugability since the NCCL error will be logged when the watchdog receives errors for the communicators and aborts them appropriately.
Tested by forcing a NCCL error with NCCL_BLOCKING_WAIT=1 and verifying that the exception is indeed logged.
ghstack-source-id: 125124310
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D27106699
fbshipit-source-id: 1d2bd9f057a3796ce15dd8a4ce34cf6899eee45c