pytorch
9b53d319 - Implement gather primitive for ProcessGroupNCCL (#66745)

Commit
2 years ago
Implement gather primitive for ProcessGroupNCCL (#66745) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66745 This PR implement NCCL gather and add gather to ProcessGroupNCCL using nccl send/recv api. NCCL doesn’t directly provide primitives for gather, so we need to be implemented on top of NCCL’s send/recv API. 1. In ProcessGroupNCCL.cpp, the outputTensors are first flattened, then inputTensors and outputFlattened are passed by the collective class to gather() function in nccl.cpp. 1. In nccl.cpp, gather is implemented using ncclSend/ncclRecv: all the ranks send inputTensor to the root rank, and the root rank uses a for loop to receive these inputTensors. ghstack-source-id: 147754838 Test Plan: test_gather_ops test_gather_checks test_gather_stress Reviewed By: pritamdamania87 Differential Revision: D29616361 fbshipit-source-id: b500d9b8e67113194c5cc6575fb0e5d806dc7782 (cherry picked from commit d560ee732eb559782a2d1d88b3cf118dcfc404bc)
Author
Committer
Parents
Loading