DeepSpeed
9a552914 - [CCLBackend] Using parallel memcpy for inference_all_reduce (#4404)

Commit
2 years ago
[CCLBackend] Using parallel memcpy for inference_all_reduce (#4404) * use parallel version of memcpy * include max buf size to 16MB per rank * support any input buffer size * fix format error
Author
Parents
Loading