pytorch
8f663170 - [17/n][torch/elastic] Make torchelastic launcher compatible with the caffe2.distributed.launch (#55687)

Commit
3 years ago
[17/n][torch/elastic] Make torchelastic launcher compatible with the caffe2.distributed.launch (#55687) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55687 The diff makes sure that users can transfer the following parameters: * master_addr * master_port * node_rank * use_env The diff implement StaticTCPRendezvous that creates a store with listener on agent rank #0 The diff modifies caffe2/rendezvous: If the worker process launched with torchelastic agent, the worker processes will create a PrefixStore("worker/") from TCPStore without listener. The diff adds macros functionality to torch/distributed/ealstic/utils that helps to resolve local_rank parameter. Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/distributed/test:launch_test Reviewed By: cbalioglu, wilson100hong Differential Revision: D27643206 fbshipit-source-id: 540fb26feac322cc3ec0a989fe53324755ccc4ea
Author
Parents
Loading