Fix `num_items_in_batch` over-counting for causal LM losses (#46204)
* Fix `num_items_in_batch` over-counting for causal LM losses
* Address review: use LOSS_MAPPING, honor pre-shifted labels, add tests
- Inspect the actual loss function via LOSS_MAPPING instead of matching
loss_type strings (catches CsmForConditionalGeneration etc.).
- If the data collator already provides `shift_labels`, count over that
tensor directly instead of slicing labels again.
- Add unit tests for `_get_num_items_in_batch` covering the causal LM
path (with and without pre-shifted labels) and the non-causal-LM path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix test_train_and_predict_loss_parity
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>