initial clean #42415

itazap wants to merge 211 commits into main from update_special_tokens
itazap
itazap rm slow
d7af5a54
itazap rm protobuf dependency
73be8c48
itazap create_fast_tokenizer file
87cfea8a
itazap move update post processor and add bos eos properties
d5e56bbd
itazap llama
dc0611f7
itazap simplify test
f2022400
itazap handle blank tok
cacf09e8
itazap save tests
26e08874
itazap rm old common tests
e4b29559
itazap llama refactored test - mixin temporary
ba3a0a46
itazap add qwen2
117ce1dc
itazap rm slow qwen2 tok
42d4e798
itazap qwen2
7fb3d772
itazap rm call_one and batch_encode_plus
21433e18
itazap rm prepare_for_model
db8923c2
itazap cohere
19138cbe
itazap gemma
ec13e398
itazap split up tests and remove common ones that shoudl not be run for each…
6c25f26f
itazap load PreTrainedSentencePieceTokenizer fallback
193684d9
itazap rm functions dedicated for batched input
d411492d
itazap spiece tests
3ee35253
itazap cut base down
e0a260d5
itazap cleaned up base to be more more abstract for other backends to implement
82653f78
itazap speed up added tokens
14d2a8ca
itazap revert _pad
4980a2fd
itazap rm specialtokenmixin and stale functions
19c9b098
itazap rm pickle tests
a9263d1d
itazap fixes missed
5fe5666c
itazap gemma test fix
51e62e1f
itazap refactor
0e5dbdf4
itazap rm legacy from llama
9136d3c8
itazap added renaming
ab77f57b
itazap add _model
36bc3ef6
itazap update legacy
c4f045c4
itazap update legacy
c80dd1db
itazap fix docstring
790c0923
itazap always load blank, then set _tokenizer if we have it
f4d956a2
itazap new toks
b2c320c2
itazap update all berttokenizer based models
0c3caff0
itazap apply feedback - delete bert duplicates
d43412a3
itazap more models --> fast only
48eeb50c
itazap more convert_slow models
d3a3cbd6
itazap fix common test refs
493f9e0b
itazap updating fast only tokenizers
a51cea01
itazap openai and pegasus
d9c1ec33
itazap enable sentencepiecebackend
d879bc3e
itazap more models
ca510297
itazap code gen
132c617e
itazap t5
ed5bf863
itazap code gen tests
158b4448
itazap speecht5
64eaf880
itazap mbart
95f48d3f
itazap mbart50
f3248d2c
itazap more models
f3dd1030
itazap more models
c66037d9
itazap layouglmv2
cb5e08b5
itazap update tests
31590335
itazap update tests
a14a45d3
itazap update tests
7ca10f8b
itazap pretrainedtokenizer
f5cbc494
itazap whisper
72e8043f
itazap whisper
3cd8e5b4
itazap layoutxlm and storing backends
4bf2b85a
itazap refactor sentencepiecebackend and additional_special_tokens
2ef0fd37
itazap renaming tokenization_utils --> tokenization_python
5c7d347f
itazap udpate tests
fcf67ff8
itazap bert test
a8ccf164
itazap blenderbot
ccca98e4
itazap clip
c118c106
itazap codegen
0f740815
itazap code_llama
a11dba71
itazap cohere
b678cde8
itazap deberata, deberat v2, funnel
ea9a5465
itazap gpt2
ffbdecf8
itazap batch update tests
9f08ade7
itazap pegasus qwen2 roberta
a7cd5c08
itazap more models
b5b3cd98
itazap layout tests
1250bcc8
itazap some renaming
cf72cae4
itazap fix references to utils_fast
4fafdcc2
itazap fix refs
236f9f18
itazap fix refs
cd743bfd
itazap fix refs
0e7e5939
itazap fix refs
2af6d2cb
itazap fix refs
b58b7b1e
itazap fix refs
518dcaf6
itazap fix refs
0f2f4b6a
itazap fix some tests
c8491486
itazap regression
0d54bbd6
itazap fix refs
81a140a5
itazap fix refs
61366d6a
itazap missed the most crucial file in my last commit
4374a66e
itazap fix refs
df383d75
itazap fix refs
b8035eca
itazap fix refs
37e1b925
itazap batch encode fix
9b45774d
itazap fix some tests
a24856d8
itazap BC for batch_decode bc too many refs
18688703
itazap more tests
35dd2509
itazap fix more tests
b0428f3b
itazap fix for processors
8fe6873a
itazap fixing more models
c1e0e461
itazap deleted mbart50 by accident
79568cdd
itazap seamless m4t
cfa159a3
itazap albert fix
5854f4c8
itazap whisper
714a856e
itazap layout3
c016f114
itazap attempt to fix cached tokenizers on CI
2e3e1780
itazap trying another fix on CI
03e3ab9f
itazap again try to work around CI
2c30d79a
itazap bertweet
98f51d55
itazap tapas
96f0517c
itazap mbart50
c26f54b8
itazap luke
da0bbf0c
itazap mluke
494ef3e3
itazap markuplm
39bb8847
itazap markuplm
960dfcf3
itazap fix some more auto tests
54992a07
itazap some random model failures
d0383bdb
itazap mistralcommontestser
a969c6b6
itazap more fixes
2bf4a13c
itazap ref fix
e88322fb
itazap siglip
cfb0100a
itazap marian
0fd10662
itazap plbart
02c524c2
itazap update utils toks
820191e6
itazap seamless m4t
0cd714d0
itazap roc bert
8a412bc7
itazap udpate byt5 test
e8c32585
itazap xlm
85a3b1f6
itazap esm
45e718f7
itazap roformer
96fc4675
itazap code llama
7727e3b5
itazap biogpt
6795515d
itazap m2m100
2f49a392
itazap dpr and flaubert
a42e7a81
itazap xlm and speech to text
33634bef
itazap tok backend pass object
ca5e3891
itazap tokenizer object pass
25021d4d
itazap wav2vec2
69610fec
itazap wav2vec2
51799caf
itazap cpmant
f23abc3e
itazap update utils tokenizers
88f0db5c
itazap cpmant
077e6f88
itazap bartpho
e004b56b
itazap test apply chat template assistant mask
e069763c
itazap apply chat template video
9df9cfc5
itazap apply chat template assistant mask
dc9b1aec
itazap test torch
4c05e9df
itazap update from slow in base and fix donut processor errors
5c209a40
itazap auto to point to tokenizers backend, fix kosmos2
d8a8db8e
itazap some non model fixes for old slow models that no longer have their ow…
6b40d915
itazap missed file from last commit
976265bc
itazap idefics2
b6ca8b25
ArthurZucker fixup
5c721057
ArthurZucker fixup
964b461b
itazap pretrained tokenizer fast test update
03814073
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into one_t…
887b4776
ArthurZucker stash
f4c46ab5
ArthurZucker Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…
efbbb043
ArthurZucker bad merged
71ef2822
ArthurZucker cherry pick more stuff that did not merge well
a5b018c8
ArthurZucker fix gptsw3
8ea91f65
ArthurZucker nit warn for now
19478948
ArthurZucker update error raising
20a06ffe
ArthurZucker just ran fixup
aa197a04
ArthurZucker bring back bert legacy
63c7c1c2
ArthurZucker fix
5895bab5
ArthurZucker nit
6b8217b6
ArthurZucker fix 56 errors on blenderbotsmall?
184ed581
ArthurZucker 18 for blenderbotsmall
09e4021f
itazap tok auto
a8c299e7
itazap missed clip
12590525
itazap fix tests
06e3485a
itazap something missed
3a95bf18
itazap token healing
05d5c08c
itazap tok common tests update - nonmodel
78f4e586
itazap try to fix non-model test in test_tokenization_utils
8fbaf836
itazap fix hub tests
fd40b1ba
itazap try to fix hub tests
70330b85
itazap custom vocab related fixed
7c780070
itazap bert jap
ca1f6b09
itazap BERT JAP
dd3ae59a
itazap rename bert legacy to bert legacy
2e1893f7
itazap Wav2vec2
f4be6a90
itazap fix in tok python to update total vocab size - fixes speech t5
919103ac
itazap blender bot small
c452f924
itazap forgot test file
6d167eb9
itazap test failures
025722be
itazap marian
7d1d0d33
itazap gpt2 tiktoken
dfb67a42
itazap big bird / marian
51da6b28
itazap udop
c611058e
itazap forgot couple changes
cc4a9721
itazap test_serve fix
51202daa
itazap missing import
ca988b90
itazap a couple processors fixes
f5bc69ef
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into one_t…
c67de105
ArthurZucker style partly
045bbffa
itazap fix to fetch tests ci
75662fd4
itazap Revert branch back to commit f5bc69ef state
8d248a39
itazap revert branch to styling
4c299246
itazap update mistral after merge
189cabd5
itazap fixes for non model tests
e02741c5
itazap some processor test fixes
b828ae16
itazap more processor test fixes
83b579cf
itazap more processor fixes
2ce27bcd
itazap hub tests
881b97cf
itazap python tok utils
2e28b3da
itazap fix hub test
925d1873
itazap initial clean
ab1df03f
Base automatically changed from one_tokenizer to main 76 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone