transformers
99c97633 - Fixed a bug calculating cross entropy loss in `JetMoeForCausalLM` (#37830)

Commit
178 days ago
Fixed a bug calculating cross entropy loss in `JetMoeForCausalLM` (#37830) fix: :bug: Fixed a bug in calculating Cross Entropy loss in JetMoeForCausalLM In the original code, we shift the logits and pass shift_logits into the self.loss_function, but in self.loss_function, the shift_logits will be shifted again, so we are actually doing "next next token prediction", which is incorrect. I have removed the logits shifting before calling self.loss_function. Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Author
Parents
Loading