[`Mixtral`] Fix loss + nits (#28115)
* default config should not use sliding window
* update the doc
* nits
* add a proper test
* update
* update
* update expected value
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* convert to float
* average then N**2
* comment
* revert nit
* good to fo
* fixup
* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* revert unrelated change
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>