Megatron-DeepSpeed
1e77844c - Add option to normalize loss per target (#326)

Commit
3 years ago
Add option to normalize loss per target (#326) * Tmp lossseq * Efficient loss normalization * Reuse variable * Simplify division * Add norm_target_loss arg * Clarify loss on targets & remove kwarg * Loss mask is already float * Move norm to batch pipe * Reshape loss mask * Move view
Author
Parents
Loading