DeepSpeed
CPU SHM based inference_all_reduce improve
#5320
Merged

CPU SHM based inference_all_reduce improve #5320

delock
delock move naive all reduce into seperate function
65721e60
delock seperate allreduce outer loop and inner loop into different functions
764aab89
delock skeleton for ring allreduce
e8c5ea37
delock interface finetune
0b404a57
delock initial ring allreduce implementation (no sync yet)
8cb9a548
delock ring allreduce can run (correctness not ensured)
448ee56f
delock change barrier to sync
5c90d8ff
delock change workspace to pointer array
d72477f9
delock fix minor error
43638256
delock better state handling for ring allreduce
8009bf67
delock fix accuracy error
19e46392
delock fix state handling
9d7bfa4a
delock seperate buffer per rank (but will hang)
9de818d3
delock Merge branch 'master' into gma/ring_allreduce
7a3ddd4e
delock cleanup
9495274f
delock per rank SHM passed
5086e5e2
delock Using ring allreduce instead of naive allreduce
36b0259b
delock finetune buffer size and max number of ranks
4bccc34d
delock cleanup code
8f645038
delock fix hang with >2 ranks
fa4176f2
delock use ring_allreduce for bufsize >1MB only
f3255991
delock fix for 3 ranks
76731bcf
delock support fp32 in ring allreduce
d7bae071
delock use naive allreduce for message < 1MB
1b0ac0ac
delock remove unused functions
93852832
delock enable distributed_naive allreduce
aca819ad
delock pass 3 ranks
15a08f20
delock add shm.cpp
31bf2768
delock split shm based collective into seperate file, no dep on oneCCL
93ebcfdc
delock remove unneeded head files
98328038
delock add timer to check variance at C++ level
aec55f22
delock add time profiling
88f1d0c1
delock delock requested a review from mrwyattii mrwyattii 2 years ago
delock delock requested a review from awan-10 awan-10 2 years ago
delock delock requested a review from arashb arashb 2 years ago
delock Merge branch 'master' into gma/shm_allreduce_improve
9f83015f
loadams Merge branch 'master' into gma/shm_allreduce_improve
bd32d272
loadams Formatting
ad33eee6
tjruwase Merge branch 'master' into gma/shm_allreduce_improve
42091d7f
tjruwase
tjruwase approved these changes on 2024-04-01
loadams loadams merged 731fd682 into master 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone