pytorch
30742193 - Enable pg_nccl to perform vector AllGather for uneven output splits (#83713)

Commit
2 years ago
Enable pg_nccl to perform vector AllGather for uneven output splits (#83713) Pushing PR on behalf of @aashaka To replace: https://github.com/pytorch/pytorch/pull/82835 Summary: A vector all_gather requires each process to gather other process' inputs into an output tensor according to the ouput list provided. Internally, pg_nccl.allgather will coalesce a list of pg_nccl._broadcast_oop to implement a vector all-gather in the case when the any shape is different in the output list. Otherwise, it will perform a ncclAllGather as usual. - This change adds an out-of-place `_broadcast_oop` function to ProcessGroupNCCL. It allows broadcasting an input tensor and placing the output in a separate output tensor. Since allgather provides an out-of-place API, an allgather_v semantic implemented inside `pg_nccl.allgather` also needs to support out-of-place, for which an out-of-place broadcast is required to be added. Test Plan: Added a new test `test_all_gather_v_cuda` for all_gather_v to `distributed_nccl_spawn`. Differential Revision: D37735263 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/83713 Approved by: https://github.com/mingzhe09088
Author
Committer
Parents
Loading