Add option to normalize loss per target (#326)
* Tmp lossseq
* Efficient loss normalization
* Reuse variable
* Simplify division
* Add norm_target_loss arg
* Clarify loss on targets & remove kwarg
* Loss mask is already float
* Move norm to batch pipe
* Reshape loss mask
* Move view