🚨🚨🚨 [Trainer] Enable `average_tokens_across_devices` by default in `TrainingArguments` (#39395)
Enable average_tokens_across_devices by default in TrainingArguments
Fixes #39392
This change improves loss calculation correctness for multi-GPU training by enabling proper token averaging across devices by default.
Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>