pytorch
e06bd8f3 - fsdp support create hybrid-sharded process group for custom backend (#100622)

Commit
1 year ago
fsdp support create hybrid-sharded process group for custom backend (#100622) FSDP creates communication groups for intra-node communication through dist.new_subgroups. Previously, dist.new_subgroups only supported creation based on the number of CUDA devices. However, issue #99706 removed the avaliable-check for CUDA devices, allowing for custom backend create group based on num of custom devices per node. This PR allows FSDP to explicitly pass device num within the node when creating communication groups for intra-node communication, instead of defaulting to the number of CUDA devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100622 Approved by: https://github.com/awgu
Author
Committer
Parents
Loading