DeepSpeed
b712babe
- Correctness fix PP+ZeRO for gradient accumulation (#1264)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
4 years ago
Correctness fix PP+ZeRO for gradient accumulation (#1264) * pass GAS boundary state from PP -> ZeRO * formatting Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
References
#1264 - Correctness fix PP+ZeRO for gradient accumulation
Author
jeffra
Parents
bff6126f
Loading