pytorch
d6a30e21 - Enable pg_nccl.reduce_scatter to perform vector ReduceScatter for uneven input splits (#82924)

Commit View On GitHub

Commit

2 years ago

Enable pg_nccl.reduce_scatter to perform vector ReduceScatter for uneven input splits (#82924) Summary: A vector reduce_scatter requires each process to reduce and scatter an input tensor according to the input list provided. Internally, pg_nccl.reduce_scatter will coalesce a list of pg_nccl._reduce_oop to implement a vector reduce-scatter in the case when the any input shape is different in the input list. Otherwise, it will perform a ncclReduceScatter as usual. - This change adds a `CoalescedWorkNCCL` class which encapsulates the WorkNCCL requests from coalesced operations. A `.wait()` on a CoalescedWorkNCCL request will call a wait on each of the WorkNCCL requests that are coalesced. - This change adds an out-of-place `_reduce_oop` function to ProcessGroupNCCL. It allows reducing an input tensor and placing the output in a separate output tensor. Since reduce_scatter provides an out-of-place API, a reduce_scatter_v semantic implemented inside `pg_nccl.reduce_scatter` also needs to support out-of-place, for which an out-of-place reduce is required to be added. Test Plan: Added a new test `test_reduce_scatter_v_cuda` for reduce_scatter_v to `distributed_nccl_spawn`. Differential Revision: D38478781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82924 Approved by: https://github.com/kwen2501

Author

aashaka

Committer

pytorchmergebot

Parents

52be9082

pytorch d6a30e21 - Enable pg_nccl.reduce_scatter to perform vector ReduceScatter for uneven input splits (#82924)

Commit

pytorch
d6a30e21 - Enable pg_nccl.reduce_scatter to perform vector ReduceScatter for uneven input splits (#82924)