DeepSpeed
CPU SHM based inference_all_reduce improve
#5320
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
36
Changes
View On
GitHub
Commits
move naive all reduce into seperate function
delock
committed
2 years ago
seperate allreduce outer loop and inner loop into different functions
delock
committed
2 years ago
skeleton for ring allreduce
delock
committed
2 years ago
interface finetune
delock
committed
2 years ago
initial ring allreduce implementation (no sync yet)
delock
committed
2 years ago
ring allreduce can run (correctness not ensured)
delock
committed
2 years ago
change barrier to sync
delock
committed
2 years ago
change workspace to pointer array
delock
committed
2 years ago
fix minor error
delock
committed
2 years ago
better state handling for ring allreduce
delock
committed
2 years ago
fix accuracy error
delock
committed
2 years ago
fix state handling
delock
committed
2 years ago
seperate buffer per rank (but will hang)
delock
committed
2 years ago
Merge branch 'master' into gma/ring_allreduce
delock
committed
2 years ago
cleanup
delock
committed
2 years ago
per rank SHM passed
delock
committed
2 years ago
Using ring allreduce instead of naive allreduce
delock
committed
2 years ago
finetune buffer size and max number of ranks
delock
committed
2 years ago
cleanup code
delock
committed
2 years ago
fix hang with >2 ranks
delock
committed
2 years ago
use ring_allreduce for bufsize >1MB only
delock
committed
2 years ago
fix for 3 ranks
delock
committed
2 years ago
support fp32 in ring allreduce
delock
committed
2 years ago
use naive allreduce for message < 1MB
delock
committed
2 years ago
remove unused functions
delock
committed
2 years ago
enable distributed_naive allreduce
delock
committed
2 years ago
pass 3 ranks
delock
committed
2 years ago
add shm.cpp
delock
committed
2 years ago
split shm based collective into seperate file, no dep on oneCCL
delock
committed
2 years ago
remove unneeded head files
delock
committed
2 years ago
add timer to check variance at C++ level
delock
committed
2 years ago
add time profiling
delock
committed
2 years ago
Merge branch 'master' into gma/shm_allreduce_improve
delock
committed
2 years ago
Merge branch 'master' into gma/shm_allreduce_improve
loadams
committed
2 years ago
Formatting
loadams
committed
2 years ago
Merge branch 'master' into gma/shm_allreduce_improve
tjruwase
committed
2 years ago
Loading