DeepSpeed
[CPU] Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node)
#3919
Merged

[CPU] Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node) #3919

delock
delock use allreduce_low_latency for AutoTP and implement low latency allred…
3b7482d7
delock delock requested a review from RezaYazdaniAminabadi RezaYazdaniAminabadi 2 years ago
delock delock requested a review from jeffra jeffra 2 years ago
delock delock requested a review from mrwyattii mrwyattii 2 years ago
delock delock requested a review from awan-10 awan-10 2 years ago
delock delock requested a review from cmikeh2 cmikeh2 2 years ago
delock delock requested a review from arashb arashb 2 years ago
delock delock changed the title (CPU) Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node) [CPU] Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node) 2 years ago
delock add fp32 support for SHM allreduce
77fe0078
delock avoid assertion for FP16 data type
6cdcd385
delock Merge branch 'up-master' into gma/ccl_low_latency
1249f9aa
delock fix format
0078b7e4
tjruwase
delock
delock change 'allreduce_low_latency' to 'inference_allreduce'
69bcb4f3
delock Merge branch 'master' into gma/ccl_low_latency
929fee15
tjruwase Merge branch 'master' into gma/ccl_low_latency
ed01c6d0
tjruwase
tjruwase commented on 2023-07-13
tjruwase
tjruwase commented on 2023-07-13
tjruwase
tjruwase commented on 2023-07-13
tjruwase
tjruwase commented on 2023-07-13
delock Fix according to comments
05b5f3e5
delock
delock change inference_allreduce to inference_all_reduce to keep naming con…
3b3fcabc
delock check whether LOCAL_SIZE is defined in ccl.cpp, also define LOCAL_SIZ…
26b38061
delock fix format
4c352a3d
delock Fix format error
bf5fc19b
tjruwase Merge branch 'master' into gma/ccl_low_latency
077a0bb4
delock Merge branch 'master' into gma/ccl_low_latency
c1324dad
mrwyattii
mrwyattii commented on 2023-07-17
delock Update tests/unit/comm/test_dist.py
7493074f
tjruwase Merge branch 'master' into gma/ccl_low_latency
8c602884
tjruwase Merge branch 'master' into gma/ccl_low_latency
866c4f09
tjruwase
tjruwase approved these changes on 2023-07-19
tjruwase tjruwase merged 1bc3b784 into master 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone