CPU SHM based inference_all_reduce improve #5320
move naive all reduce into seperate function
65721e60
seperate allreduce outer loop and inner loop into different functions
764aab89
skeleton for ring allreduce
e8c5ea37
interface finetune
0b404a57
initial ring allreduce implementation (no sync yet)
8cb9a548
ring allreduce can run (correctness not ensured)
448ee56f
change barrier to sync
5c90d8ff
change workspace to pointer array
d72477f9
fix minor error
43638256
better state handling for ring allreduce
8009bf67
fix accuracy error
19e46392
fix state handling
9d7bfa4a
seperate buffer per rank (but will hang)
9de818d3
Merge branch 'master' into gma/ring_allreduce
7a3ddd4e
cleanup
9495274f
per rank SHM passed
5086e5e2
Using ring allreduce instead of naive allreduce
36b0259b
finetune buffer size and max number of ranks
4bccc34d
cleanup code
8f645038
fix hang with >2 ranks
fa4176f2
use ring_allreduce for bufsize >1MB only
f3255991
fix for 3 ranks
76731bcf
support fp32 in ring allreduce
d7bae071
use naive allreduce for message < 1MB
1b0ac0ac
remove unused functions
93852832
enable distributed_naive allreduce
aca819ad
pass 3 ranks
15a08f20
add shm.cpp
31bf2768
split shm based collective into seperate file, no dep on oneCCL
93ebcfdc
remove unneeded head files
98328038
add timer to check variance at C++ level
aec55f22
add time profiling
88f1d0c1
Merge branch 'master' into gma/shm_allreduce_improve
9f83015f
Merge branch 'master' into gma/shm_allreduce_improve
bd32d272
Formatting
ad33eee6
Merge branch 'master' into gma/shm_allreduce_improve
42091d7f
tjruwase
approved these changes
on 2024-04-01
loadams
merged
731fd682
into master 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub