pytorch
59402f51 - Make init_method url appending step re-usable by both init_process_group and init_model_parallel(init_rpc) (#28226)

Commit

5 years ago

Make init_method url appending step re-usable by both init_process_group and init_model_parallel(init_rpc) (#28226) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28226 # Goal Rendezvous step should be the first step not only for `init_process_group` but also for `init_model_parallel`. The road block is that there is special step in `init_process_group` where arguments `rank`, `world_size` passed to `init_process_group(..)` are appended to `init_method` url string. We need to make this argument appending step common and re-usable for both `init_process_group` and `init_model_parallel`. # Solution - Put argument appending inside of `rendezvous` function. - Remove manual `init_method` url construction. Delegate the responsibility to the `rendezvous` function. - Use the `rendezvous` function for any `RpcAgent`. Test Plan: ``` buck test mode/dev-nosan caffe2/test:c10d ``` ``` buck test mode/dev-nosan caffe2/test:rpc_fork -- test_invalid_names buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_worker_id ``` ``` buck test mode/dev-nosan caffe2/torch/fb/distributed/pytorch/tests:test_rpc -- test_sync_rpc ``` ``` buck test mode/dev-nosan caffe2/torch/fb/rendezvous:zeus_test ``` ``` buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_pairwise_attention_pooling -- test_single_trainer_multiple_pss ``` Differential Revision: D5524494 fbshipit-source-id: 50be58ec3c928621b0874b044ef4a1640534d8ef

Author

xush6528

Committer

facebook-github-bot

Parents

e31adeb4

Files7

test
- dist_autograd_test.py
- dist_utils.py
- rpc_test.py
torch/distributed
- distributed_c10d.py
- rendezvous.py
- rpc
  - __init__.py
  - api.py

pytorch 59402f51 - Make init_method url appending step re-usable by both init_process_group and init_model_parallel(init_rpc) (#28226)

pytorch
59402f51 - Make init_method url appending step re-usable by both init_process_group and init_model_parallel(init_rpc) (#28226)