[CCLBackend] Using parallel memcpy for inference_all_reduce #4404
use parallel version of memcpy
11e1dcc3
include max buf size to 16MB per rank
0ede0af9
support any input buffer size
da784b9e
fix format error
e4ac57cc
Merge branch 'master' into gma/ccl_parallel_memcpy
35a602df
tjruwase
approved these changes
on 2023-10-03
tjruwase
merged
9a552914
into master 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub