DeepSpeed
9a552914
- [CCLBackend] Using parallel memcpy for inference_all_reduce (#4404)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
[CCLBackend] Using parallel memcpy for inference_all_reduce (#4404) * use parallel version of memcpy * include max buf size to 16MB per rank * support any input buffer size * fix format error
References
#4404 - [CCLBackend] Using parallel memcpy for inference_all_reduce
Author
delock
Parents
1760627e
Loading