pytorch
66647038 - Implement backend-agnostic rpc._wait_all_workers() utility (#31888)

Commit
4 years ago
Implement backend-agnostic rpc._wait_all_workers() utility (#31888) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31888 We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent. - Sort worker names. - Elect the first name as the leader in the ordered worker names. - Followers reports therir intent to synchronize to the leader. - Leader also reports to itself, when `_wait_all_workers()` called. - If all workers report their intent to proceed, leader send the command to every one to proceed. ghstack-source-id: 96386210 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` Differential Revision: D19290954 fbshipit-source-id: cdb22203c2f27b5e0d0ad5b2d3b279d438c22dcf
Author
Parents
Loading