pytorch
c19b9cd1 - Add torch::cuda::ncll::all2all (#45900)

Commit View On GitHub

Commit

3 years ago

Add torch::cuda::ncll::all2all (#45900) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45900 Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp` Fixes https://github.com/pytorch/pytorch/issues/42517 Here is a NCCL dependency graph: ``` libnccl.a --> libtorch_cuda.so ---> libtorch_python.so | ^ | | --------> libc10d.a ----------------- ``` When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before https://github.com/pytorch/pytorch/pull/42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so` But adding `ncclSend`|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library. This PR adds `nccl[Send|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library. More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls. Test Plan: Imported from OSS Reviewed By: mingzhe09088 Differential Revision: D24138011 Pulled By: malfet fbshipit-source-id: 33305197fc7d8707b7fd3a66b543f7733b9241a1

Author

malfet

Committer

facebook-github-bot

Parents

ef4817fe

pytorch c19b9cd1 - Add torch::cuda::ncll::all2all (#45900)

Commit

pytorch
c19b9cd1 - Add torch::cuda::ncll::all2all (#45900)