[CPU] SHM based allreduce improvement for small message size #5571
add profile for naive all_reduce
58302133
add multi parallel copy
fec2c9be
alternative multi-parallel memcpy
4c642a1c
use double buffer
ed748ce1
change naive all reduce to symmetric
e2312ec7
clean up
3e4b6c3f
don't use coll_begin set in naive_all_reduce
031b8310
seperate buffer for different algorithm
2b15c220
turn off profile
d5865aa7
fix distributed naive allreduce
25882778
cleanup
2f694439
Remove profiling code
d1b2f098
add back original naive_all_reduce
0ba1f07c
remove naive_all_reduce
05fc2505
cleanup
7bc708d4
remove barrier which is not needed
af7d4fab
cleanup
b76937cb
can handle > 16 rank with efficiency
0da84b6e
Remove REPEAT
49c2153b
clean up state
a3cc1293
fix distributed allreduce perf
7b41d2f2
remove unnecessary state change
87accf4b
double buffer for distributed_naive_all_reduce
a1ff77e0
fix result error
8af61131
multiparallel copy #1
00a1c272
single omp region multi parallel copy
31b36439
add alternaive path
8e5639eb
remove multi-memcpy which actually cause perf drop
c0733cbe
fix distributed accuracy issue
b7713b69
cleanup
3f088e4b
fix format
6c7ec551
Merge branch 'master' into gma/symmetric_naive_allreduce
dabae15d
Merge branch 'master' into gma/symmetric_naive_allreduce
f0634219
adk9
requested changes
on 2024-06-10
Follow comments, remove unneeded codes and syncs.
608cf7c1
adk9
approved these changes
on 2024-06-12
Merge branch 'master' into gma/symmetric_naive_allreduce
1847a10d
fix format
7f614cb4
Merge branch 'master' into gma/symmetric_naive_allreduce
e1853f6e
adk9
merged
eda5075b
into master 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub