[Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words (#24622)
* patch `_tokenize` function
* more tests
* properly fix
* fixup
* Update src/transformers/models/t5/tokenization_t5.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix without ifs
* update
* protect import
* add python processing
* is first needed
* add doc and update with lefacy
* updaate
* fix T5 SPM converter
* styling
* fix T5 warning
* add is_seqio_available
* remove is_first
* revert some changes
* more tests and update
* update llama test batterie
* fixup
* refactor T5 spm common tests
* draft the llama tests
* update
* uopdate test
* nits
* refine
* name nit
* fix t5 tests
* fix T5
* update
* revert convert slow to fast changes that fail lots of tests
* legacy support
* fixup
* nits is first not defined
* don't use legacy behaviour for switch transformers
* style
* My attempt to check.
* nits
* fixes
* update
* fixup
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates
* fixup
* add legacy warning
* fixup
* warning_once nit
* update t5 documentation test
* update llama tok documentation
* add space to warning
* nits
* nit
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* last nits
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>