[CPU] Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node) #3919
use allreduce_low_latency for AutoTP and implement low latency allred…
3b7482d7
delock
changed the title (CPU) Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node) [CPU] Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node) 2 years ago
add fp32 support for SHM allreduce
77fe0078
avoid assertion for FP16 data type
6cdcd385
Merge branch 'up-master' into gma/ccl_low_latency
1249f9aa
fix format
0078b7e4
change 'allreduce_low_latency' to 'inference_allreduce'
69bcb4f3
Merge branch 'master' into gma/ccl_low_latency
929fee15
Merge branch 'master' into gma/ccl_low_latency
ed01c6d0
Fix according to comments
05b5f3e5
change inference_allreduce to inference_all_reduce to keep naming con…
3b3fcabc
check whether LOCAL_SIZE is defined in ccl.cpp, also define LOCAL_SIZ…
26b38061
fix format
4c352a3d
Fix format error
bf5fc19b
Merge branch 'master' into gma/ccl_low_latency
077a0bb4
Merge branch 'master' into gma/ccl_low_latency
c1324dad
Update tests/unit/comm/test_dist.py
7493074f
Merge branch 'master' into gma/ccl_low_latency
8c602884
Merge branch 'master' into gma/ccl_low_latency
866c4f09
tjruwase
approved these changes
on 2023-07-19
tjruwase
merged
1bc3b784
into master 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub