pytorch
59402f51 - Make init_method url appending step re-usable by both init_process_group and init_model_parallel(init_rpc) (#28226)

Comment changes are shownComment changes are hidden
Commit
5 years ago
Make init_method url appending step re-usable by both init_process_group and init_model_parallel(init_rpc) (#28226) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28226 # Goal Rendezvous step should be the first step not only for `init_process_group` but also for `init_model_parallel`. The road block is that there is special step in `init_process_group` where arguments `rank`, `world_size` passed to `init_process_group(..)` are appended to `init_method` url string. We need to make this argument appending step common and re-usable for both `init_process_group` and `init_model_parallel`. # Solution - Put argument appending inside of `rendezvous` function. - Remove manual `init_method` url construction. Delegate the responsibility to the `rendezvous` function. - Use the `rendezvous` function for any `RpcAgent`. Test Plan: ``` buck test mode/dev-nosan caffe2/test:c10d ``` ``` buck test mode/dev-nosan caffe2/test:rpc_fork -- test_invalid_names buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_worker_id ``` ``` buck test mode/dev-nosan caffe2/torch/fb/distributed/pytorch/tests:test_rpc -- test_sync_rpc ``` ``` buck test mode/dev-nosan caffe2/torch/fb/rendezvous:zeus_test ``` ``` buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_pairwise_attention_pooling -- test_single_trainer_multiple_pss ``` Differential Revision: D5524494 fbshipit-source-id: 50be58ec3c928621b0874b044ef4a1640534d8ef
Author
Parents
  • test
    • File
      dist_autograd_test.py
    • File
      dist_utils.py
    • File
      rpc_test.py
  • torch/distributed
    • File
      distributed_c10d.py
    • File
      rendezvous.py
    • rpc
      • File
        __init__.py
      • File
        api.py