transformers
99c97633 - Fixed a bug calculating cross entropy loss in `JetMoeForCausalLM` (#37830)

Commit

178 days ago

Fixed a bug calculating cross entropy loss in `JetMoeForCausalLM` (#37830) fix: :bug: Fixed a bug in calculating Cross Entropy loss in JetMoeForCausalLM In the original code, we shift the logits and pass shift_logits into the self.loss_function, but in self.loss_function, the shift_logits will be shifted again, so we are actually doing "next next token prediction", which is incorrect. I have removed the logits shifting before calling self.loss_function. Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

References

#37830 - Fixed a bug calculating cross entropy loss in `JetMoeForCausalLM`

#39821 - Support MetaCLIP 2

#58 - Add EoMT DINOv3 model

#59 - Fix attention mask handling in EoMT-DINOv3 converter

#41212 - Add EoMT with DINOv3 backbone

#62 - Add initial DEIMv2 model implementation

Author

Phoenix-Shen

Parents

667ad023

transformers 99c97633 - Fixed a bug calculating cross entropy loss in `JetMoeForCausalLM` (#37830)

transformers
99c97633 - Fixed a bug calculating cross entropy loss in `JetMoeForCausalLM` (#37830)