DeepSpeed
4d269c6e - Changing monitor loss to aggregate loss over gradient accumulation steps (#3428)

Commit
2 years ago
Changing monitor loss to aggregate loss over gradient accumulation steps (#3428) * Changing monitor loss to aggregate loss over gas. * Adding self.losses to engine constructor. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading