transformers
rm slow tokenizers
#40936
Merged

rm slow tokenizers #40936

ArthurZucker merged 277 commits into main from one_tokenizer
itazap
itazap itazap changed the title rm slow tokenizer llama rm slow tokenizers 190 days ago
ArthurZucker
ArthurZucker commented on 2025-09-22
itazap itazap force pushed from 54346010 to 6a5de093 184 days ago
itazap itazap force pushed from 6a5de093 to 3cc11a95 184 days ago
itazap itazap force pushed from 3cc11a95 to af77c18d 184 days ago
itazap itazap force pushed from af77c18d to dc0611f7 184 days ago
itazap
itazap commented on 2025-09-25
itazap itazap marked this pull request as draft 179 days ago
ArthurZucker
ArthurZucker commented on 2025-10-03
itazap itazap force pushed from 0e0a75f7 to 6c25f26f 172 days ago
itazap fixes missed
5fe5666c
itazap gemma test fix
51e62e1f
itazap itazap requested a review from ArthurZucker ArthurZucker 165 days ago
ArthurZucker
ArthurZucker commented on 2025-10-14
ArthurZucker
ArthurZucker commented on 2025-10-14
itazap refactor
0e5dbdf4
itazap rm legacy from llama
9136d3c8
itazap added renaming
ab77f57b
itazap add _model
36bc3ef6
itazap update legacy
c4f045c4
itazap update legacy
c80dd1db
itazap fix docstring
790c0923
itazap itazap requested a review from ArthurZucker ArthurZucker 165 days ago
itazap always load blank, then set _tokenizer if we have it
f4d956a2
itazap new toks
b2c320c2
itazap update all berttokenizer based models
0c3caff0
ArthurZucker
ArthurZucker commented on 2025-10-16
itazap apply feedback - delete bert duplicates
d43412a3
itazap more models --> fast only
48eeb50c
itazap more convert_slow models
d3a3cbd6
itazap fix common test refs
493f9e0b
itazap updating fast only tokenizers
a51cea01
itazap openai and pegasus
d9c1ec33
itazap enable sentencepiecebackend
d879bc3e
itazap more models
ca510297
itazap code gen
132c617e
itazap t5
ed5bf863
itazap code gen tests
158b4448
itazap speecht5
64eaf880
itazap mbart
95f48d3f
itazap mbart50
f3248d2c
itazap more models
f3dd1030
itazap more models
c66037d9
itazap layouglmv2
cb5e08b5
itazap update tests
31590335
itazap update tests
a14a45d3
itazap update tests
7ca10f8b
itazap pretrainedtokenizer
f5cbc494
itazap whisper
72e8043f
itazap whisper
3cd8e5b4
itazap layoutxlm and storing backends
4bf2b85a
itazap refactor sentencepiecebackend and additional_special_tokens
2ef0fd37
itazap renaming tokenization_utils --> tokenization_python
5c7d347f
itazap udpate tests
fcf67ff8
itazap bert test
a8ccf164
itazap blenderbot
ccca98e4
itazap clip
c118c106
itazap codegen
0f740815
itazap code_llama
a11dba71
itazap cohere
b678cde8
itazap deberata, deberat v2, funnel
ea9a5465
itazap gpt2
ffbdecf8
itazap batch update tests
9f08ade7
itazap pegasus qwen2 roberta
a7cd5c08
itazap itazap marked this pull request as ready for review 148 days ago
itazap more models
b5b3cd98
itazap layout tests
1250bcc8
itazap some renaming
cf72cae4
itazap fix references to utils_fast
4fafdcc2
itazap fix refs
236f9f18
itazap fix refs
cd743bfd
itazap fix refs
0e7e5939
itazap fix refs
2af6d2cb
itazap fix refs
b58b7b1e
itazap fix refs
518dcaf6
itazap fix refs
0f2f4b6a
itazap itazap requested a review from ArthurZucker ArthurZucker 146 days ago
itazap fix some tests
c8491486
itazap regression
0d54bbd6
itazap fix refs
81a140a5
itazap fix refs
61366d6a
itazap missed the most crucial file in my last commit
4374a66e
itazap fix refs
df383d75
itazap fix refs
b8035eca
itazap fix refs
37e1b925
itazap batch encode fix
9b45774d
itazap fix some tests
a24856d8
itazap BC for batch_decode bc too many refs
18688703
itazap more tests
35dd2509
itazap fix more tests
b0428f3b
itazap fix for processors
8fe6873a
itazap fixing more models
c1e0e461
itazap deleted mbart50 by accident
79568cdd
itazap seamless m4t
cfa159a3
itazap itazap force pushed from 977c5324 to cfa159a3 137 days ago
itazap albert fix
5854f4c8
itazap whisper
714a856e
itazap layout3
c016f114
itazap attempt to fix cached tokenizers on CI
2e3e1780
itazap trying another fix on CI
03e3ab9f
itazap again try to work around CI
2c30d79a
itazap bertweet
98f51d55
itazap tapas
96f0517c
itazap mbart50
c26f54b8
itazap luke
da0bbf0c
itazap mluke
494ef3e3
itazap markuplm
39bb8847
itazap markuplm
960dfcf3
itazap fix some more auto tests
54992a07
itazap some random model failures
d0383bdb
itazap mistralcommontestser
a969c6b6
itazap more fixes
2bf4a13c
itazap ref fix
e88322fb
itazap siglip
cfb0100a
itazap marian
0fd10662
itazap plbart
02c524c2
itazap update utils toks
820191e6
itazap seamless m4t
0cd714d0
itazap roc bert
8a412bc7
itazap udpate byt5 test
e8c32585
itazap xlm
85a3b1f6
itazap esm
45e718f7
itazap roformer
96fc4675
itazap code llama
7727e3b5
itazap biogpt
6795515d
itazap m2m100
2f49a392
itazap itazap force pushed from 6f08e64f to 2f49a392 130 days ago
itazap dpr and flaubert
a42e7a81
itazap xlm and speech to text
33634bef
itazap tok backend pass object
ca5e3891
itazap tokenizer object pass
25021d4d
itazap wav2vec2
69610fec
itazap wav2vec2
51799caf
itazap cpmant
f23abc3e
itazap update utils tokenizers
88f0db5c
itazap cpmant
077e6f88
itazap bartpho
e004b56b
itazap itazap force pushed from 9b8a9b5d to e004b56b 130 days ago
itazap test apply chat template assistant mask
e069763c
itazap apply chat template video
9df9cfc5
itazap apply chat template assistant mask
dc9b1aec
itazap test torch
4c05e9df
itazap update from slow in base and fix donut processor errors
5c209a40
itazap auto to point to tokenizers backend, fix kosmos2
d8a8db8e
itazap itazap force pushed from 11b57e65 to d8a8db8e 129 days ago
itazap some non model fixes for old slow models that no longer have their ow…
6b40d915
itazap missed file from last commit
976265bc
itazap idefics2
b6ca8b25
itazap itazap force pushed from b6ca8b25 to 976265bc 129 days ago
ArthurZucker fixup
5c721057
ArthurZucker fixup
964b461b
itazap pretrained tokenizer fast test update
03814073
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into one_t…
887b4776
ArthurZucker stash
f4c46ab5
ArthurZucker Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…
efbbb043
ArthurZucker bad merged
71ef2822
ArthurZucker cherry pick more stuff that did not merge well
a5b018c8
ArthurZucker fix gptsw3
8ea91f65
ArthurZucker nit warn for now
19478948
ArthurZucker update error raising
20a06ffe
ArthurZucker just ran fixup
aa197a04
ArthurZucker bring back bert legacy
63c7c1c2
ArthurZucker fix
5895bab5
ArthurZucker nit
6b8217b6
ArthurZucker fix 56 errors on blenderbotsmall?
184ed581
ArthurZucker 18 for blenderbotsmall
09e4021f
itazap itazap force pushed from adb317e8 to 09e4021f 128 days ago
itazap tok auto
a8c299e7
itazap missed clip
12590525
itazap fix tests
06e3485a
itazap something missed
3a95bf18
itazap token healing
05d5c08c
itazap tok common tests update - nonmodel
78f4e586
itazap try to fix non-model test in test_tokenization_utils
8fbaf836
itazap fix hub tests
fd40b1ba
itazap try to fix hub tests
70330b85
itazap custom vocab related fixed
7c780070
itazap bert jap
ca1f6b09
itazap BERT JAP
dd3ae59a
itazap rename bert legacy to bert legacy
2e1893f7
itazap Wav2vec2
f4be6a90
itazap fix in tok python to update total vocab size - fixes speech t5
919103ac
itazap blender bot small
c452f924
itazap forgot test file
6d167eb9
itazap test failures
025722be
itazap marian
7d1d0d33
itazap gpt2 tiktoken
dfb67a42
itazap big bird / marian
51da6b28
itazap udop
c611058e
itazap forgot couple changes
cc4a9721
itazap test_serve fix
51202daa
itazap missing import
ca988b90
itazap a couple processors fixes
f5bc69ef
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into one_t…
c67de105
ArthurZucker style partly
045bbffa
itazap fix to fetch tests ci
75662fd4
itazap Revert branch back to commit f5bc69ef state
8d248a39
itazap revert branch to styling
4c299246
itazap update mistral after merge
189cabd5
github-actions
itazap fixes for non model tests
e02741c5
itazap some processor test fixes
b828ae16
itazap more processor test fixes
83b579cf
itazap more processor fixes
2ce27bcd
itazap hub tests
881b97cf
itazap itazap force pushed from f1e1ad94 to 881b97cf 122 days ago
itazap python tok utils
2e28b3da
itazap fix hub test
925d1873
itazap itazap force pushed from 94b3f013 to 925d1873 122 days ago
itazap itazap force pushed from 1e32c326 to 925d1873 122 days ago
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into one_t…
66242316
ArthurZucker make style for now
437321b8
ArthurZucker remove problemattic fic copies
cd4d3ac9
ArthurZucker python utils/check_copies.py --fix_and_overwrite
5c5864f5
ArthurZucker more styling
2f13c132
ArthurZucker
ArthurZucker commented on 2025-11-17
ArthurZucker fixup
1e1aa11c
ArthurZucker silence docstirng
5eeb1fed
ArthurZucker fix import?
dea8e1ef
ArthurZucker fix imports
452d6d88
ArthurZucker add the local test as well
e6502059
itazap throw spm error
3dd17161
itazap itazap force pushed from 0059deea to 3dd17161 121 days ago
itazap llamas
e700dfa7
itazap fix a couple tests
ce23d672
itazap broke ci
ff1bf368
itazap broke ci
0bdfeae1
itazap broke ci
a1376493
itazap broke ci
366597c9
itazap add logs to debug gemma on ci
22887b1c
itazap gemma and llama
73819f44
itazap gemma
c24c9970
itazap revert las commit
551a959b
itazap gemma debug
a18e84dc
itazap gemma debug
c23ee139
itazap itazap force pushed from 3ac4620c to c23ee139 121 days ago
itazap itazap force pushed from dd6c61c9 to c23ee139 121 days ago
itazap gemma
93187b3e
itazap safely import spiece backend
81428ef7
itazap tok tests
eb95c2e8
itazap check none
24d89c4c
itazap setup and qual
e2c44345
itazap ruff
7a737b77
itazap del dev files
a19c90c1
itazap itazap force pushed from 49e491bb to a19c90c1 121 days ago
itazap tok auto
18e74845
itazap fill docstrings
3cdd8ee8
itazap update auto
50756c49
itazap blenderbot small nit
6bccb46c
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into one_t…
a76015ab
ArthurZucker add migration guide
4afb5706
ArthurZucker move mixtral patch to `TokenizersBackend`, move `TokenizerExtractor`
be1d95a1
ArthurZucker rename MistralCommonTokenizer to MistralCommonB ackend
fad31d7c
ArthurZucker Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…
d4aff20f
ArthurZucker nit
3ab4becd
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into one_t…
0c1a40a5
ArthurZucker fix failures
30f16402
ArthurZucker fixup
f2a14826
ArthurZucker remoove one old test
d8010f85
ArthurZucker mark the slow one as slow
82e56759
ArthurZucker very small fixes
088fc39a
ArthurZucker update auto mapping for missing ones
f677ddf7
ArthurZucker fixup lorsd
d30e46b7
ArthurZucker fixup doc and stuff
ad24f43c
ArthurZucker should be the final fixe
ebfe7f19
ArthurZucker processing update
c4a743d2
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into one_t…
f81a9668
ArthurZucker update
9a5638dd
ArthurZucker FIX or brute AI fix the llava test
7c32dfbb
ArthurZucker style
c520a66a
ArthurZucker slow?
718b2f03
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into one_t…
20d9036e
ArthurZucker fix is offline mode?
8f536c2e
ArthurZucker fix mt5
e96c18b3
itazap One tok utils (#42462)
5ce65b8e
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into one_t…
4418e8a9
ArthurZucker fix cohere
7f9954a1
ArthurZucker Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…
bfa5fd0a
ArthurZucker ArthurZucker added for_v5?
ArthurZucker ArthurZucker added Core: Tokenization
ArthurZucker ?
4dce834e
ArthurZucker up
fcdc9bb8
ArthurZucker am I dumbb?
a5a3a7c8
ArthurZucker grumble
0244be9b
ArthurZucker ArthurZucker merged 05c0e1d3 into main 120 days ago
ArthurZucker ArthurZucker deleted the one_tokenizer branch 120 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone