Fix Florence2 training-loss double-shift (same pattern as Moonshine #… (#46898)
* Fix Florence2 training-loss double-shift (same pattern as Moonshine #46784)
Florence2ForConditionalGeneration shifts labels right via shift_tokens_right()
to create decoder_input_ids, then calls self.loss_function() which maps to
ForCausalLMLoss — that function shifts labels again internally. This causes
the model to train against labels[..., 1:] instead of labels.
Replace self.loss_function() with plain CrossEntropyLoss, matching the fix
applied to Moonshine in #46784 and CohereASR.
* Fix import sort order for ruff I001
* Use shift_labels parameter instead of CrossEntropyLoss (per reviewer feedback)
* Fix ruff formatting (one param per line)
* Use do_shift_labels flag: only skip shift when shift_tokens_right() was called (per reviewer)
* Simplify: always pass shift_labels=labels (per reviewer feedback)