pytorch
766eba60 - (torchx/elastic) honor NCCL_ASYNC_ERROR_HANDLING set from the env var (#73982)

Commit
2 years ago
(torchx/elastic) honor NCCL_ASYNC_ERROR_HANDLING set from the env var (#73982) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73982 Currently there is no way for users using torchelastic to override NCCL_ASYNC_ERROR_HANDLING=0. This PR enables this. Test Plan: Added unittests Manual testing ``` $ torchx run fb.dist.ddp -- --img torchx_examples -m print_env_vars.py --env NCCL_ASYNC_ERROR_HANDLING=0 ``` Validated the NCCL_ASYNC_ERROR_HANDLING in the process running `print_env_vars.py` is indeed `0`. Reviewed By: mannatsingh, aivanou Differential Revision: D34765786 fbshipit-source-id: 3f9f6d3b61e7d265adf689d387e020ab534c9259 (cherry picked from commit 2b787b46c6d37f049fe39eb64eecedf68799e75c)
Author
Kiuk Chung
Committer
Parents
Loading