transformers
b4d55488 - 🚨🚨🚨 [`SPM`] Finish fix spm models 🚨🚨🚨 (#25224)

Commit

2 years ago

🚨🚨🚨 [`SPM`] Finish fix spm models 🚨🚨🚨 (#25224) * fix EVERYTHING * more fixes * ⚗️⚗️ Tokenizer magic ⚗️⚗️ * wrong value but test passes for the TODO * update * updat * safe protobuf import? * style * non gated repo * update * fixup * Update src/transformers/models/llama/tokenization_llama.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llama/tokenization_llama.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/t5/test_tokenization_t5.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * nits * fix t5 too * use assert equal * fix llama decoding * nits on t5 * fixup * only remove the prefix space, not other spaces * more deconding tests and more todos * fix CI as well * fixup * skip failing test on CI (its tf its ok) * skip test_subword_regularization_tokenizer that is also crashing on the CI for TF * update llama * revert good fixes * fixup * empty * explain why we need to encode with an additional token * better warning? * nits --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

References

#25224 - 🚨🚨🚨 [`SPM`] Finish fix spm models 🚨🚨🚨

Author

ArthurZucker

Parents

5347d000

transformers b4d55488 - 🚨🚨🚨 [`SPM`] Finish fix spm models 🚨🚨🚨 (#25224)

transformers
b4d55488 - 🚨🚨🚨 [`SPM`] Finish fix spm models 🚨🚨🚨 (#25224)