pytorch
5a42a97c - Add NCCL_ASYNC_ERROR_HANDLING as an environment variable (#59109)

Commit
3 years ago
Add NCCL_ASYNC_ERROR_HANDLING as an environment variable (#59109) Summary: Fixes https://github.com/pytorch/pytorch/issues/57878. This adds `NCCL_ASYNC_ERROR_HANDLING` as a DDP relevant environment variable and includes a check for that variable in the test `test_dump_DDP_relevant_env_vars()`. Notably, the modified test now checks for the new variable but does not check for any of the other previously-existing relevant environment variables that were not already tested for (e.g. `NCCL_BLOCKING_WAIT`). The change was tested via the following on an AI AWS cluster: `WORLD_SIZE=2 BACKEND=nccl gpurun pytest test/distributed/test_distributed_spawn.py -k test_dump_DDP_relevant_env_vars -vs` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59109 Reviewed By: H-Huang, SciPioneer Differential Revision: D28761148 Pulled By: andwgu fbshipit-source-id: 7be4820e61a670b001408d0dd273f65029b1d2fe
Author
Parents
Loading