pytorch
ddf582da - Modify nccl_dependency to take dev mode (#79169)

Commit
2 years ago
Modify nccl_dependency to take dev mode (#79169) Summary: Modify nccl_dependency to take dev mode. Default is still the tp2 version Suggestion from D35919342 are added into this Test Plan: NCCL TESTS Using version dev: Build: hpc_comms.use_nccl = dev ``` buck build mode/opt -c hpc_comms.use_nccl=dev -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=v100,a100 //param_bench/train/comms/cpp/nccl-tests/src:nccl_allreduce_perf --show-full-output --verbose 1 ``` build done successfully Running test on devgpu: ``` /usr/local/fbcode/platform009/bin/mpirun -np 8 -x NCCL_DEBUG=INFO -x NCCL_DEBUG_SUBSYS=INIT,GRAPH,TUNING,ENV,NET ./buck-out/gen/param_bench/train/comms/cpp/nccl-tests/src/nccl_allreduce_perf -b 8 -e 128M -f 2 ``` Result: P507192135 - nccl version from logs "NCCL version 2.10.3dev+cudaCUDA_MAJOR.CUDA_MINOR" -------- Using version dev_v2.10.3-1: Build: hpc_comms.use_nccl=dev_v2.10.3-1 ``` buck kill && buck clean && buck build mode/opt -c hpc_comms.use_nccl=dev_v2.10.3-1 -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=v100,a100 //param_bench/train/comms/cpp/nccl-tests/src:nccl_allreduce_perf --show-full-output --verbose 1 ``` Build done successfully Running test on devgpu: ``` /usr/local/fbcode/platform009/bin/mpirun -np 8 -x NCCL_DEBUG=INFO -x NCCL_DEBUG_SUBSYS=INIT,GRAPH,TUNING,ENV,NET ./buck-out/gen/param_bench/train/comms/cpp/nccl-tests/src/nccl_allreduce_perf -b 8 -e 128M -f 2 ``` Result: P507194570 - nccl version from logs "NCCL version 2.10.3dev+cudaCUDA_MAJOR.CUDA_MINOR" -------- Using version tp2: Build: hpc_comms.use_nccl=tp2 ``` buck kill && buck clean && buck build mode/opt -c hpc_comms.use_nccl=tp2 -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=v100,a100 //param_bench/train/comms/cpp/nccl-tests/src:nccl_allreduce_perf --show-full-output --verbose 1 ``` Build done successfully Running test on devgpu: ``` /usr/local/fbcode/platform009/bin/mpirun -np 8 -x NCCL_DEBUG=INFO -x NCCL_DEBUG_SUBSYS=INIT,GRAPH,TUNING,ENV,NET ./buck-out/gen/param_bench/train/comms/cpp/nccl-tests/src/nccl_allreduce_perf -b 8 -e 128M -f 2 ``` Result: P507195497 - nccl version from logs "NCCL version 2.10.3+cudaCUDA_MAJOR.CUDA_MINOR" -------- Using version default: Build: hpc_comms.use_nccl=tp2 ``` buck kill && buck clean && buck build mode/opt -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=v100,a100 //param_bench/train/comms/cpp/nccl-tests/src:nccl_allreduce_perf --show-full-output --verbose 1 ``` Build done successfully Running test on devgpu: ``` /usr/local/fbcode/platform009/bin/mpirun -np 8 -x NCCL_DEBUG=INFO -x NCCL_DEBUG_SUBSYS=INIT,GRAPH,TUNING,ENV,NET ./buck-out/gen/param_bench/train/comms/cpp/nccl-tests/src/nccl_allreduce_perf -b 8 -e 128M -f 2 ``` Result: P507207374 - nccl version from logs "NCCL version 2.10.3+cudaCUDA_MAJOR.CUDA_MINOR" -------- RUNNING PARAM COMMS TO TEST CAFFE TORCH INTEGRATION WITH NCCL DEV LIB Using version dev: Build: hpc_comms.use_nccl = dev ``` buck kill && buck clean && buck build mode/opt -c fbcode.platform=platform009 -c hpc_comms.use_nccl=dev -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=v100,a100 //param_bench/train/comms/pt:comms --show-full-output --verbose 1 ``` build done successfully Running test on devgpu: ``` sh ai_codesign/comms/scripts/test_param_local_no_mpi.sh -s 8 --backend nccl --coll all_reduce ``` Result: P507214467 - nccl version from logs "NCCL version 2.10.3dev+cudaCUDA_MAJOR.CUDA_MINOR" -------- Using version dev_v2.10.3-1: Build: hpc_comms.use_nccl=dev_v2.10.3-1 ``` buck kill && buck clean && buck build mode/opt -c fbcode.platform=platform009 -c hpc_comms.use_nccl=dev_v2.10.3-1 -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=v100,a100 //param_bench/train/comms/pt:comms --show-full-output --verbose 1 ``` Build done successfully Running test on devgpu: ``` sh ai_codesign/comms/scripts/test_param_local_no_mpi.sh -s 8 --backend nccl --coll all_reduce ``` Result: P507247559 - nccl version from logs "NCCL version 2.10.3dev+cudaCUDA_MAJOR.CUDA_MINOR" -------- Using version tp2: Build: hpc_comms.use_nccl=tp2 ``` buck kill && buck clean && buck build mode/opt -c fbcode.platform=platform009 -c hpc_comms.use_nccl=tp2 -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=v100,a100 //param_bench/train/comms/pt:comms --show-full-output --verbose 1 ``` Build done successfully Running test on devgpu: ``` sh ai_codesign/comms/scripts/test_param_local_no_mpi.sh -s 8 --backend nccl --coll all_reduce ``` Result: P507251808 - nccl version from logs "NCCL version 2.10.3+cudaCUDA_MAJOR.CUDA_MINOR" -------- Using version default: Build: hpc_comms.use_nccl=tp2 ``` buck kill && buck clean && buck build mode/opt -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=v100,a100 //param_bench/train/comms/pt:comms --show-full-output --verbose 1 ``` Build done successfully Running test on devgpu: ``` sh ai_codesign/comms/scripts/test_param_local_no_mpi.sh -s 8 --backend nccl --coll all_reduce ``` Result: P507256357 - nccl version from logs "NCCL version 2.10.3+cudaCUDA_MAJOR.CUDA_MINOR" Differential Revision: D36873694 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79169 Approved by: https://github.com/kingchc, https://github.com/kwen2501
Author
Pavani Panakanti
Committer
Parents
Loading