transformers
initial clean
#42415
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
211
Changes
View On
GitHub
initial clean
#42415
itazap
wants to merge 211 commits into
main
from
update_special_tokens
rm slow
d7af5a54
rm protobuf dependency
73be8c48
create_fast_tokenizer file
87cfea8a
move update post processor and add bos eos properties
d5e56bbd
llama
dc0611f7
simplify test
f2022400
handle blank tok
cacf09e8
save tests
26e08874
rm old common tests
e4b29559
llama refactored test - mixin temporary
ba3a0a46
add qwen2
117ce1dc
rm slow qwen2 tok
42d4e798
qwen2
7fb3d772
rm call_one and batch_encode_plus
21433e18
rm prepare_for_model
db8923c2
cohere
19138cbe
gemma
ec13e398
split up tests and remove common ones that shoudl not be run for each…
6c25f26f
load PreTrainedSentencePieceTokenizer fallback
193684d9
rm functions dedicated for batched input
d411492d
spiece tests
3ee35253
cut base down
e0a260d5
cleaned up base to be more more abstract for other backends to implement
82653f78
speed up added tokens
14d2a8ca
revert _pad
4980a2fd
rm specialtokenmixin and stale functions
19c9b098
rm pickle tests
a9263d1d
fixes missed
5fe5666c
gemma test fix
51e62e1f
refactor
0e5dbdf4
rm legacy from llama
9136d3c8
added renaming
ab77f57b
add _model
36bc3ef6
update legacy
c4f045c4
update legacy
c80dd1db
fix docstring
790c0923
always load blank, then set _tokenizer if we have it
f4d956a2
new toks
b2c320c2
update all berttokenizer based models
0c3caff0
apply feedback - delete bert duplicates
d43412a3
more models --> fast only
48eeb50c
more convert_slow models
d3a3cbd6
fix common test refs
493f9e0b
updating fast only tokenizers
a51cea01
openai and pegasus
d9c1ec33
enable sentencepiecebackend
d879bc3e
more models
ca510297
code gen
132c617e
t5
ed5bf863
code gen tests
158b4448
speecht5
64eaf880
mbart
95f48d3f
mbart50
f3248d2c
more models
f3dd1030
more models
c66037d9
layouglmv2
cb5e08b5
update tests
31590335
update tests
a14a45d3
update tests
7ca10f8b
pretrainedtokenizer
f5cbc494
whisper
72e8043f
whisper
3cd8e5b4
layoutxlm and storing backends
4bf2b85a
refactor sentencepiecebackend and additional_special_tokens
2ef0fd37
renaming tokenization_utils --> tokenization_python
5c7d347f
udpate tests
fcf67ff8
bert test
a8ccf164
blenderbot
ccca98e4
clip
c118c106
codegen
0f740815
code_llama
a11dba71
cohere
b678cde8
deberata, deberat v2, funnel
ea9a5465
gpt2
ffbdecf8
batch update tests
9f08ade7
pegasus qwen2 roberta
a7cd5c08
more models
b5b3cd98
layout tests
1250bcc8
some renaming
cf72cae4
fix references to utils_fast
4fafdcc2
fix refs
236f9f18
fix refs
cd743bfd
fix refs
0e7e5939
fix refs
2af6d2cb
fix refs
b58b7b1e
fix refs
518dcaf6
fix refs
0f2f4b6a
fix some tests
c8491486
regression
0d54bbd6
fix refs
81a140a5
fix refs
61366d6a
missed the most crucial file in my last commit
4374a66e
fix refs
df383d75
fix refs
b8035eca
fix refs
37e1b925
batch encode fix
9b45774d
fix some tests
a24856d8
BC for batch_decode bc too many refs
18688703
more tests
35dd2509
fix more tests
b0428f3b
fix for processors
8fe6873a
fixing more models
c1e0e461
deleted mbart50 by accident
79568cdd
seamless m4t
cfa159a3
albert fix
5854f4c8
whisper
714a856e
layout3
c016f114
attempt to fix cached tokenizers on CI
2e3e1780
trying another fix on CI
03e3ab9f
again try to work around CI
2c30d79a
bertweet
98f51d55
tapas
96f0517c
mbart50
c26f54b8
luke
da0bbf0c
mluke
494ef3e3
markuplm
39bb8847
markuplm
960dfcf3
fix some more auto tests
54992a07
some random model failures
d0383bdb
mistralcommontestser
a969c6b6
more fixes
2bf4a13c
ref fix
e88322fb
siglip
cfb0100a
marian
0fd10662
plbart
02c524c2
update utils toks
820191e6
seamless m4t
0cd714d0
roc bert
8a412bc7
udpate byt5 test
e8c32585
xlm
85a3b1f6
esm
45e718f7
roformer
96fc4675
code llama
7727e3b5
biogpt
6795515d
m2m100
2f49a392
dpr and flaubert
a42e7a81
xlm and speech to text
33634bef
tok backend pass object
ca5e3891
tokenizer object pass
25021d4d
wav2vec2
69610fec
wav2vec2
51799caf
cpmant
f23abc3e
update utils tokenizers
88f0db5c
cpmant
077e6f88
bartpho
e004b56b
test apply chat template assistant mask
e069763c
apply chat template video
9df9cfc5
apply chat template assistant mask
dc9b1aec
test torch
4c05e9df
update from slow in base and fix donut processor errors
5c209a40
auto to point to tokenizers backend, fix kosmos2
d8a8db8e
some non model fixes for old slow models that no longer have their ow…
6b40d915
missed file from last commit
976265bc
idefics2
b6ca8b25
fixup
5c721057
fixup
964b461b
pretrained tokenizer fast test update
03814073
Merge branch 'main' of github.com:huggingface/transformers into one_t…
887b4776
stash
f4c46ab5
Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…
efbbb043
bad merged
71ef2822
cherry pick more stuff that did not merge well
a5b018c8
fix gptsw3
8ea91f65
nit warn for now
19478948
update error raising
20a06ffe
just ran fixup
aa197a04
bring back bert legacy
63c7c1c2
fix
5895bab5
nit
6b8217b6
fix 56 errors on blenderbotsmall?
184ed581
18 for blenderbotsmall
09e4021f
tok auto
a8c299e7
missed clip
12590525
fix tests
06e3485a
something missed
3a95bf18
token healing
05d5c08c
tok common tests update - nonmodel
78f4e586
try to fix non-model test in test_tokenization_utils
8fbaf836
fix hub tests
fd40b1ba
try to fix hub tests
70330b85
custom vocab related fixed
7c780070
bert jap
ca1f6b09
BERT JAP
dd3ae59a
rename bert legacy to bert legacy
2e1893f7
Wav2vec2
f4be6a90
fix in tok python to update total vocab size - fixes speech t5
919103ac
blender bot small
c452f924
forgot test file
6d167eb9
test failures
025722be
marian
7d1d0d33
gpt2 tiktoken
dfb67a42
big bird / marian
51da6b28
udop
c611058e
forgot couple changes
cc4a9721
test_serve fix
51202daa
missing import
ca988b90
a couple processors fixes
f5bc69ef
Merge branch 'main' of github.com:huggingface/transformers into one_t…
c67de105
style partly
045bbffa
fix to fetch tests ci
75662fd4
Revert branch back to commit f5bc69ef state
8d248a39
revert branch to styling
4c299246
update mistral after merge
189cabd5
fixes for non model tests
e02741c5
some processor test fixes
b828ae16
more processor test fixes
83b579cf
more processor fixes
2ce27bcd
hub tests
881b97cf
python tok utils
2e28b3da
fix hub test
925d1873
initial clean
ab1df03f
Base automatically changed from
one_tokenizer
to
main
76 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub