pytorch
48ea7c80 - [C10d] Support subgroups (#59111)

Commit
3 years ago
[C10d] Support subgroups (#59111) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59111 Create a util function for initializing subgroups. By default, each subgroup contains all the ranks within a machine. This util function can be used by both local SGD and SyncBatchNorm optimization. Additionally, clang format `distributed/__init__.py` after importing `_rank_not_in_group` which is used by the unit test, and also clang format `distributed_c10d.py`. Note that this API does not accept another overall main group. Like APEX API `create_syncbn_process_group` [here](https://nvidia.github.io/apex/_modules/apex/parallel.html), always uses the global world size and should only be applied when CUDA is available. #Closes: https://github.com/pytorch/pytorch/issues/53962 ghstack-source-id: 130975027 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups_group_size_exceeds_world_size buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups_world_size_not_divisible_by_group_size buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups_by_enumeration buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups_by_enumeration_input_rank_exceeds_world_size buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_new_subgroups_overlap_not_allowed Reviewed By: rohan-varma Differential Revision: D28495672 fbshipit-source-id: fdcc405411dd409634eb51806ee0a320d1ecd4e0
Author
Yi Wang
Parents
Loading