DeepSpeed
4d269c6e
- Changing monitor loss to aggregate loss over gradient accumulation steps (#3428)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
2 years ago
Changing monitor loss to aggregate loss over gradient accumulation steps (#3428) * Changing monitor loss to aggregate loss over gas. * Adding self.losses to engine constructor. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
References
#3428 - Changing monitor loss to aggregate loss over gradient accumulation steps
Author
jomayeri
Parents
5979ece8
Loading