pytorch
b6d31829 - [FSDP] Do not `sys.exit(0)` explicitly at end of unit test (#100645)

Commit
1 year ago
[FSDP] Do not `sys.exit(0)` explicitly at end of unit test (#100645) We are going to see if this closes https://github.com/pytorch/pytorch/issues/100641. The guess is that this might allow NCCL to be destroyed before Python finalizes, avoiding any issues with calling `pybind11::gil_scoped_release` like in [`destroy_nccl_comm`](https://github.com/pytorch/pytorch/blob/8994d9e6109c541a1d581c383e4de9ed68205d91/torch/csrc/cuda/python_nccl.cpp#L46). Test plan: ``` CUDA_VISIBLE_DEVICES=0,7 numactl -C 2 python test/distributed/fsdp/test_fsdp_unshard_params.py -v -k test_with_grads_core --repeat 200 2>&1 | tee out ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/100645 Approved by: https://github.com/rohan-varma, https://github.com/fegin
Author
Committer
Parents
Loading