pytorch
789dc6d4 - [NCCL] Add more details for checkForNCCLErrors (#54117)

Commit
3 years ago
[NCCL] Add more details for checkForNCCLErrors (#54117) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54117 https://github.com/pytorch/pytorch/pull/45950 enhanced our NCCL logging errors so that we add some basic debug information about what when wrong when erroring out with a NCCL error. However, that PR only used the added function for `C10D_NCCL_CHECK` which is used to check the return values of NCCL calls. However, in ProcessGroupNCCL we also have `checkForNCCLErrors` which checks for errors on nccl communicators, and in case of errors it would be good to have this logging there too. Also renames the function s/errorMessage/getNcclErrorDetailStr ghstack-source-id: 124662592 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D27100497 fbshipit-source-id: fec3663ffa3e92bae8391ef4f77054abb4bb9715
Author
Parents
Loading