fix(trainer): Correct loss scaling for incomplete gradient accumulation steps (#39659)
* Fix issue[#38837]: wrong loss scaled in last step of epoch
* chore: trigger CI
* Update src/transformers/trainer.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/transformers/modeling_flash_attention_utils.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
---------
Co-authored-by: taihang <taihang@U-2RHYVWX7-2207.local>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>