enable average tokens across devices (#34373)
* enable average tokens across devices
* reduce earlier in case model needs it
* simplify if statement
* reformat code to make ruff happy
* add doc for argument: average_tokens_across_devices
* cannot find world size when pytorch is unavailable
* format code
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>