DeepSpeed
8d901bfc
- fix the gradient scale for when zero is not enabled
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
fix the gradient scale for when zero is not enabled
References
#4530 - Fix the sequence-parallelism for the dense model architecture
Author
Reza Yazdani
Parents
066644d7
Loading