DeepSpeed
[CCLBackend] Using parallel memcpy for inference_all_reduce
#4404
Merged

[CCLBackend] Using parallel memcpy for inference_all_reduce #4404

delock
delock use parallel version of memcpy
11e1dcc3
delock include max buf size to 16MB per rank
0ede0af9
delock support any input buffer size
da784b9e
delock delock requested a review from RezaYazdaniAminabadi RezaYazdaniAminabadi 2 years ago
delock delock requested a review from awan-10 awan-10 2 years ago
delock delock requested a review from jeffra jeffra 2 years ago
delock delock requested a review from cmikeh2 cmikeh2 2 years ago
delock delock requested a review from arashb arashb 2 years ago
delock fix format error
e4ac57cc
delock Merge branch 'master' into gma/ccl_parallel_memcpy
35a602df
tjruwase
tjruwase approved these changes on 2023-10-03
tjruwase tjruwase merged 9a552914 into master 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone