DeepSpeed
caba320a
- fuse the all_to_all for the seq-parallel into one and use all_to_all_single
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
fuse the all_to_all for the seq-parallel into one and use all_to_all_single
References
#4695 - Communication Optimization for Large-Scale Training
Author
Reza Yazdani
Parents
970015bb
Loading