transformers
rm slow tokenizers
#40936
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
277
Changes
View On
GitHub
rm slow tokenizers
#40936
ArthurZucker
merged 277 commits into
main
from
one_tokenizer
itazap
changed the title
rm slow tokenizer llama
rm slow tokenizers
190 days ago
ArthurZucker
commented on 2025-09-22
itazap
force pushed
from
54346010
to
6a5de093
184 days ago
itazap
force pushed
from
6a5de093
to
3cc11a95
184 days ago
itazap
force pushed
from
3cc11a95
to
af77c18d
184 days ago
itazap
force pushed
from
af77c18d
to
dc0611f7
184 days ago
itazap
commented on 2025-09-25
itazap
marked this pull request as draft
179 days ago
ArthurZucker
commented on 2025-10-03
itazap
force pushed
from
0e0a75f7
to
6c25f26f
172 days ago
fixes missed
5fe5666c
gemma test fix
51e62e1f
itazap
requested a review
from
ArthurZucker
165 days ago
ArthurZucker
commented on 2025-10-14
ArthurZucker
commented on 2025-10-14
refactor
0e5dbdf4
rm legacy from llama
9136d3c8
added renaming
ab77f57b
add _model
36bc3ef6
update legacy
c4f045c4
update legacy
c80dd1db
fix docstring
790c0923
itazap
requested a review
from
ArthurZucker
165 days ago
always load blank, then set _tokenizer if we have it
f4d956a2
new toks
b2c320c2
update all berttokenizer based models
0c3caff0
ArthurZucker
commented on 2025-10-16
apply feedback - delete bert duplicates
d43412a3
more models --> fast only
48eeb50c
more convert_slow models
d3a3cbd6
fix common test refs
493f9e0b
updating fast only tokenizers
a51cea01
openai and pegasus
d9c1ec33
enable sentencepiecebackend
d879bc3e
more models
ca510297
code gen
132c617e
t5
ed5bf863
code gen tests
158b4448
speecht5
64eaf880
mbart
95f48d3f
mbart50
f3248d2c
more models
f3dd1030
more models
c66037d9
layouglmv2
cb5e08b5
update tests
31590335
update tests
a14a45d3
update tests
7ca10f8b
pretrainedtokenizer
f5cbc494
whisper
72e8043f
whisper
3cd8e5b4
layoutxlm and storing backends
4bf2b85a
refactor sentencepiecebackend and additional_special_tokens
2ef0fd37
renaming tokenization_utils --> tokenization_python
5c7d347f
udpate tests
fcf67ff8
bert test
a8ccf164
blenderbot
ccca98e4
clip
c118c106
codegen
0f740815
code_llama
a11dba71
cohere
b678cde8
deberata, deberat v2, funnel
ea9a5465
gpt2
ffbdecf8
batch update tests
9f08ade7
pegasus qwen2 roberta
a7cd5c08
itazap
marked this pull request as ready for review
148 days ago
more models
b5b3cd98
layout tests
1250bcc8
some renaming
cf72cae4
fix references to utils_fast
4fafdcc2
fix refs
236f9f18
fix refs
cd743bfd
fix refs
0e7e5939
fix refs
2af6d2cb
fix refs
b58b7b1e
fix refs
518dcaf6
fix refs
0f2f4b6a
itazap
requested a review
from
ArthurZucker
146 days ago
fix some tests
c8491486
regression
0d54bbd6
fix refs
81a140a5
fix refs
61366d6a
missed the most crucial file in my last commit
4374a66e
fix refs
df383d75
fix refs
b8035eca
fix refs
37e1b925
batch encode fix
9b45774d
fix some tests
a24856d8
BC for batch_decode bc too many refs
18688703
more tests
35dd2509
fix more tests
b0428f3b
fix for processors
8fe6873a
fixing more models
c1e0e461
deleted mbart50 by accident
79568cdd
seamless m4t
cfa159a3
itazap
force pushed
from
977c5324
to
cfa159a3
137 days ago
albert fix
5854f4c8
whisper
714a856e
layout3
c016f114
attempt to fix cached tokenizers on CI
2e3e1780
trying another fix on CI
03e3ab9f
again try to work around CI
2c30d79a
bertweet
98f51d55
tapas
96f0517c
mbart50
c26f54b8
luke
da0bbf0c
mluke
494ef3e3
markuplm
39bb8847
markuplm
960dfcf3
fix some more auto tests
54992a07
some random model failures
d0383bdb
mistralcommontestser
a969c6b6
more fixes
2bf4a13c
ref fix
e88322fb
siglip
cfb0100a
marian
0fd10662
plbart
02c524c2
update utils toks
820191e6
seamless m4t
0cd714d0
roc bert
8a412bc7
udpate byt5 test
e8c32585
xlm
85a3b1f6
esm
45e718f7
roformer
96fc4675
code llama
7727e3b5
biogpt
6795515d
m2m100
2f49a392
itazap
force pushed
from
6f08e64f
to
2f49a392
130 days ago
dpr and flaubert
a42e7a81
xlm and speech to text
33634bef
tok backend pass object
ca5e3891
tokenizer object pass
25021d4d
wav2vec2
69610fec
wav2vec2
51799caf
cpmant
f23abc3e
update utils tokenizers
88f0db5c
cpmant
077e6f88
bartpho
e004b56b
itazap
force pushed
from
9b8a9b5d
to
e004b56b
130 days ago
test apply chat template assistant mask
e069763c
apply chat template video
9df9cfc5
apply chat template assistant mask
dc9b1aec
test torch
4c05e9df
update from slow in base and fix donut processor errors
5c209a40
auto to point to tokenizers backend, fix kosmos2
d8a8db8e
itazap
force pushed
from
11b57e65
to
d8a8db8e
129 days ago
some non model fixes for old slow models that no longer have their ow…
6b40d915
missed file from last commit
976265bc
idefics2
b6ca8b25
itazap
force pushed
from
b6ca8b25
to
976265bc
129 days ago
fixup
5c721057
fixup
964b461b
pretrained tokenizer fast test update
03814073
Merge branch 'main' of github.com:huggingface/transformers into one_t…
887b4776
stash
f4c46ab5
Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…
efbbb043
bad merged
71ef2822
cherry pick more stuff that did not merge well
a5b018c8
fix gptsw3
8ea91f65
nit warn for now
19478948
update error raising
20a06ffe
just ran fixup
aa197a04
bring back bert legacy
63c7c1c2
fix
5895bab5
nit
6b8217b6
fix 56 errors on blenderbotsmall?
184ed581
18 for blenderbotsmall
09e4021f
itazap
force pushed
from
adb317e8
to
09e4021f
128 days ago
tok auto
a8c299e7
missed clip
12590525
fix tests
06e3485a
something missed
3a95bf18
token healing
05d5c08c
tok common tests update - nonmodel
78f4e586
try to fix non-model test in test_tokenization_utils
8fbaf836
fix hub tests
fd40b1ba
try to fix hub tests
70330b85
custom vocab related fixed
7c780070
bert jap
ca1f6b09
BERT JAP
dd3ae59a
rename bert legacy to bert legacy
2e1893f7
Wav2vec2
f4be6a90
fix in tok python to update total vocab size - fixes speech t5
919103ac
blender bot small
c452f924
forgot test file
6d167eb9
test failures
025722be
marian
7d1d0d33
gpt2 tiktoken
dfb67a42
big bird / marian
51da6b28
udop
c611058e
forgot couple changes
cc4a9721
test_serve fix
51202daa
missing import
ca988b90
a couple processors fixes
f5bc69ef
Merge branch 'main' of github.com:huggingface/transformers into one_t…
c67de105
style partly
045bbffa
fix to fetch tests ci
75662fd4
Revert branch back to commit f5bc69ef state
8d248a39
revert branch to styling
4c299246
update mistral after merge
189cabd5
fixes for non model tests
e02741c5
some processor test fixes
b828ae16
more processor test fixes
83b579cf
more processor fixes
2ce27bcd
hub tests
881b97cf
itazap
force pushed
from
f1e1ad94
to
881b97cf
122 days ago
python tok utils
2e28b3da
fix hub test
925d1873
itazap
force pushed
from
94b3f013
to
925d1873
122 days ago
itazap
force pushed
from
1e32c326
to
925d1873
122 days ago
Merge branch 'main' of github.com:huggingface/transformers into one_t…
66242316
make style for now
437321b8
remove problemattic fic copies
cd4d3ac9
python utils/check_copies.py --fix_and_overwrite
5c5864f5
more styling
2f13c132
ArthurZucker
commented on 2025-11-17
fixup
1e1aa11c
silence docstirng
5eeb1fed
fix import?
dea8e1ef
fix imports
452d6d88
add the local test as well
e6502059
throw spm error
3dd17161
itazap
force pushed
from
0059deea
to
3dd17161
121 days ago
llamas
e700dfa7
fix a couple tests
ce23d672
broke ci
ff1bf368
broke ci
0bdfeae1
broke ci
a1376493
broke ci
366597c9
add logs to debug gemma on ci
22887b1c
gemma and llama
73819f44
gemma
c24c9970
revert las commit
551a959b
gemma debug
a18e84dc
gemma debug
c23ee139
itazap
force pushed
from
3ac4620c
to
c23ee139
121 days ago
itazap
force pushed
from
dd6c61c9
to
c23ee139
121 days ago
gemma
93187b3e
safely import spiece backend
81428ef7
tok tests
eb95c2e8
check none
24d89c4c
setup and qual
e2c44345
ruff
7a737b77
del dev files
a19c90c1
itazap
force pushed
from
49e491bb
to
a19c90c1
121 days ago
tok auto
18e74845
fill docstrings
3cdd8ee8
update auto
50756c49
blenderbot small nit
6bccb46c
Merge branch 'main' of github.com:huggingface/transformers into one_t…
a76015ab
add migration guide
4afb5706
move mixtral patch to `TokenizersBackend`, move `TokenizerExtractor`
be1d95a1
rename MistralCommonTokenizer to MistralCommonB ackend
fad31d7c
Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…
d4aff20f
nit
3ab4becd
Merge branch 'main' of github.com:huggingface/transformers into one_t…
0c1a40a5
fix failures
30f16402
fixup
f2a14826
remoove one old test
d8010f85
mark the slow one as slow
82e56759
very small fixes
088fc39a
update auto mapping for missing ones
f677ddf7
fixup lorsd
d30e46b7
fixup doc and stuff
ad24f43c
should be the final fixe
ebfe7f19
processing update
c4a743d2
Merge branch 'main' of github.com:huggingface/transformers into one_t…
f81a9668
update
9a5638dd
FIX or brute AI fix the llava test
7c32dfbb
style
c520a66a
slow?
718b2f03
Merge branch 'main' of github.com:huggingface/transformers into one_t…
20d9036e
fix is offline mode?
8f536c2e
fix mt5
e96c18b3
One tok utils (#42462)
5ce65b8e
Merge branch 'main' of github.com:huggingface/transformers into one_t…
4418e8a9
fix cohere
7f9954a1
Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…
bfa5fd0a
ArthurZucker
added
for_v5?
ArthurZucker
added
Core: Tokenization
?
4dce834e
up
fcdc9bb8
am I dumbb?
a5a3a7c8
grumble
0244be9b
ArthurZucker
merged
05c0e1d3
into main
120 days ago
ArthurZucker
deleted the one_tokenizer branch
120 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
ArthurZucker
Assignees
No one assigned
Labels
Core: Tokenization
for_v5?
Milestone
No milestone
Login to write a write a comment.
Login via GitHub