Add NCCL_ASYNC_ERROR_HANDLING as an environment variable (#59109)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57878.
This adds `NCCL_ASYNC_ERROR_HANDLING` as a DDP relevant environment variable and includes a check for that variable in the test `test_dump_DDP_relevant_env_vars()`. Notably, the modified test now checks for the new variable but does not check for any of the other previously-existing relevant environment variables that were not already tested for (e.g. `NCCL_BLOCKING_WAIT`).
The change was tested via the following on an AI AWS cluster:
`WORLD_SIZE=2 BACKEND=nccl gpurun pytest test/distributed/test_distributed_spawn.py -k test_dump_DDP_relevant_env_vars -vs`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59109
Reviewed By: H-Huang, SciPioneer
Differential Revision: D28761148
Pulled By: andwgu
fbshipit-source-id: 7be4820e61a670b001408d0dd273f65029b1d2fe