transformers
2370ba6e - Fix Florence2 training-loss double-shift (same pattern as Moonshine #… (#46898)

Commit
5 days ago
Fix Florence2 training-loss double-shift (same pattern as Moonshine #… (#46898) * Fix Florence2 training-loss double-shift (same pattern as Moonshine #46784) Florence2ForConditionalGeneration shifts labels right via shift_tokens_right() to create decoder_input_ids, then calls self.loss_function() which maps to ForCausalLMLoss — that function shifts labels again internally. This causes the model to train against labels[..., 1:] instead of labels. Replace self.loss_function() with plain CrossEntropyLoss, matching the fix applied to Moonshine in #46784 and CohereASR. * Fix import sort order for ruff I001 * Use shift_labels parameter instead of CrossEntropyLoss (per reviewer feedback) * Fix ruff formatting (one param per line) * Use do_shift_labels flag: only skip shift when shift_tokens_right() was called (per reviewer) * Simplify: always pass shift_labels=labels (per reviewer feedback)
Author
Parents
Loading