pytorch
0128eb9a - Fix TSAN issue in distributed tests (#59238)

Commit
3 years ago
Fix TSAN issue in distributed tests (#59238) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59238 Creating a `mutliprocessing.Manager()` launches a new process using the `fork` method (because it's the default one), and then in that subprocess it launches a new thread. TSAN really doesn't like this (and rightly so!) because we already had threads in the superprocess, and intermixing threads and forks is dangerous. The proper way to deal with this is to `exec` inside the child process or, in other words, use the `spawn` method. Note that the method used to launch the Manager is entirely unrelated from the method used to launch our "own" subprocesses, hence we were using `fork` for the Manager even though we were using `spawn` for our own subprocesses. ghstack-source-id: 130240724 Test Plan: Reverted the silencing introduced in D28490129, ran the `test_init_rpc_then_pg` test from the TensorPipe suite and saw the original TSAN failure. Then applied my fix, re-ran the test, and the failure was gone. Reviewed By: zhaojuanmao Differential Revision: D28794321 fbshipit-source-id: 12242e69be399a7f02a40a0ebb3d92f92e00ce73
Author
lw lw
Parents
Loading