transformers
73a13f86 - Refactor-tokenization-more (#42563)

Commit
61 days ago
Refactor-tokenization-more (#42563) * On commit to bind them all * nits * smnall update * elif * super small nit * BPE! * fix * up up up * fix? * one typo * per model updates * more model specific updates * more per model updates * more model specific updates * simplify default merges * fiuxp * update * update * style * fix * fix colpali * nits * simpler regex + big shitty bird * fixup and fix * fix codellama * up * fix pop on none * fix parkeet * fix llama * big fixup * fix markul lm * update common * fix mbart * fix seamlessm4T * fix comment * torch tests * nnits and revert UNK idx change * oh only one deberta * torch tests * add convert from spm per model! * fix last 2 for pegasus * fix torch tests * fixes * fix tests * check versioned files * fix processor auto test * fix custom tok clip * try this fix * modeling rag * fix rag * roformer the Tokenizers way * up * updatge * fix unk * update * fix roberta * if there is no mapped class and no tokenizer.json its fucked -> just have the mapped class ready! * fix the rest * fix copies * fix doc and copies * fix mbart50 * fix deberta_v2 test * fix and simplify whisper :) * fix big bird default was worng * fix final * fixup * small nit * a weird way to fix fuyu? * default xlm roberta to fix kosmo behaviour! * remove small errors * last fix? * fix pixtral * style * fix * quality ta radce * fix? * remove something * remov one code that shouldd not have been there! * fix ? * fixup * update * fix for custom code * add a custom model path to make sure custom stuff is registe * fix trust remote code * exceeded * don't * ouppsy for cohere * why is this one also affected * fixup * fixup * nits * fix idefics3 tests * okay read the processor * fix the layout.... models * nits * codellama needs the bos passed * fix dpr * fix? * fixup * distilbert defaults * fix * clvp update to PythonTokenizer * bloom * style * layoutxlm * style * olmo * only pop when we don't convert from tokenizer.json * fixup * hub issue * id * fix --------- Co-authored-by: itazap <ita.zaporozhets@huggingface.co> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Author
Parents
Loading