pytorch
42b4d0e9 - [caffe2] remove unecessary RCCL dependency

Commit
2 years ago
[caffe2] remove unecessary RCCL dependency Summary: RCCL is required by two components in hipified Pytorch: (1) gloo and (2) hipified ProcessGroupNCCL. - For (1) the RCCL dependency is managed in `./third_party/gloo/cmake/Dependencies.cmake` and can be enabled/disabled via `USE_RCCL`. - For (2) the RCCL dependency is managed via `./cmake/Dependencies.cmake` and can be on/off via `USE_NCCL`. The additional dependency removed in this commit forced hipified Pytorch to load librccl.so even when USE_RCCL=OFF USE_NCCL=OFF is set, i.e., when using torch_ucc/ucc for AMD GPU mem type. This caused conflicts when we use a non-system default librccl.so (i.e., not in ROCM_PATH) for torch_ucc/ucc. This commit removes the unnecessary RCCL dependency. This will ensure a cleaner way to use torch_ucc with a user-specified RCCL library. Test Plan: ## Verify OSS pytorch on an AMD GPU machine (MI100) ``` ROCM_PATH=/opt/rocm-4.5.2 git clone https://github.com/pytorch/pytorch.git cd pytorch python3 tools/amd_build/build_amd.py USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py develop USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py install ``` log for develop: P492778257 log for install: P492778277 ## Verify OSS pytorch + TorchUCC on an AMD GPU machine (MI100) ``` export RCCL_INSTALL_DIR=/opt/rccl-rocm-rel-4.4 git clone https://github.com/facebookresearch/torch_ucc.git cd torch_ucc UCX_HOME=$RCCL_INSTALL_DIR UCC_HOME=$RCCL_INSTALL_DIR WITH_CUDA=$ROCM_PATH python setup.py # run param comm export HSA_ENABLE_SDMA=0 export LD_LIBRARY_PATH=$RCCL_INSTALL_DIR cd test git clone https://github.com/facebookresearch/param cd .. /bin/bash ./test/start_test.sh ./test/param/train/comms/pt/comms.py --backend ucc --device cuda --b 4 --e 4M --c 1 --collective all_reduce ``` - log for param comm: P493033836 - Verified librccl.so in `/opt/rccl-rocm-rel-4.4` is used via checking version string in log. "[localbuild]" is added in RCCL source. ``` RCCL version 2.9.9+hip4.4 [localbuild] ``` Differential Revision: D35476911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75547 Approved by: https://github.com/malfet, https://github.com/jeffdaily
Author
Committer
Parents
Loading