transformers
One tok typing
#42437
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
271
Changes
View On
GitHub
One tok typing
#42437
itazap
wants to merge 271 commits into
main
from
one_tok_typing
cut base down
e0a260d5
cleaned up base to be more more abstract for other backends to implement
82653f78
speed up added tokens
14d2a8ca
revert _pad
4980a2fd
rm specialtokenmixin and stale functions
19c9b098
rm pickle tests
a9263d1d
fixes missed
5fe5666c
gemma test fix
51e62e1f
refactor
0e5dbdf4
rm legacy from llama
9136d3c8
added renaming
ab77f57b
add _model
36bc3ef6
update legacy
c4f045c4
update legacy
c80dd1db
fix docstring
790c0923
always load blank, then set _tokenizer if we have it
f4d956a2
new toks
b2c320c2
update all berttokenizer based models
0c3caff0
apply feedback - delete bert duplicates
d43412a3
more models --> fast only
48eeb50c
more convert_slow models
d3a3cbd6
fix common test refs
493f9e0b
updating fast only tokenizers
a51cea01
openai and pegasus
d9c1ec33
enable sentencepiecebackend
d879bc3e
more models
ca510297
code gen
132c617e
t5
ed5bf863
code gen tests
158b4448
speecht5
64eaf880
mbart
95f48d3f
mbart50
f3248d2c
more models
f3dd1030
more models
c66037d9
layouglmv2
cb5e08b5
update tests
31590335
update tests
a14a45d3
update tests
7ca10f8b
pretrainedtokenizer
f5cbc494
whisper
72e8043f
whisper
3cd8e5b4
layoutxlm and storing backends
4bf2b85a
refactor sentencepiecebackend and additional_special_tokens
2ef0fd37
renaming tokenization_utils --> tokenization_python
5c7d347f
udpate tests
fcf67ff8
bert test
a8ccf164
blenderbot
ccca98e4
clip
c118c106
codegen
0f740815
code_llama
a11dba71
cohere
b678cde8
deberata, deberat v2, funnel
ea9a5465
gpt2
ffbdecf8
batch update tests
9f08ade7
pegasus qwen2 roberta
a7cd5c08
more models
b5b3cd98
layout tests
1250bcc8
some renaming
cf72cae4
fix references to utils_fast
4fafdcc2
fix refs
236f9f18
fix refs
cd743bfd
fix refs
0e7e5939
fix refs
2af6d2cb
fix refs
b58b7b1e
fix refs
518dcaf6
fix refs
0f2f4b6a
fix some tests
c8491486
regression
0d54bbd6
fix refs
81a140a5
fix refs
61366d6a
missed the most crucial file in my last commit
4374a66e
fix refs
df383d75
fix refs
b8035eca
fix refs
37e1b925
batch encode fix
9b45774d
fix some tests
a24856d8
BC for batch_decode bc too many refs
18688703
more tests
35dd2509
fix more tests
b0428f3b
fix for processors
8fe6873a
fixing more models
c1e0e461
deleted mbart50 by accident
79568cdd
seamless m4t
cfa159a3
albert fix
5854f4c8
whisper
714a856e
layout3
c016f114
attempt to fix cached tokenizers on CI
2e3e1780
trying another fix on CI
03e3ab9f
again try to work around CI
2c30d79a
bertweet
98f51d55
tapas
96f0517c
mbart50
c26f54b8
luke
da0bbf0c
mluke
494ef3e3
markuplm
39bb8847
markuplm
960dfcf3
fix some more auto tests
54992a07
some random model failures
d0383bdb
mistralcommontestser
a969c6b6
more fixes
2bf4a13c
ref fix
e88322fb
siglip
cfb0100a
marian
0fd10662
plbart
02c524c2
update utils toks
820191e6
seamless m4t
0cd714d0
roc bert
8a412bc7
udpate byt5 test
e8c32585
xlm
85a3b1f6
esm
45e718f7
roformer
96fc4675
code llama
7727e3b5
biogpt
6795515d
m2m100
2f49a392
dpr and flaubert
a42e7a81
xlm and speech to text
33634bef
tok backend pass object
ca5e3891
tokenizer object pass
25021d4d
wav2vec2
69610fec
wav2vec2
51799caf
cpmant
f23abc3e
update utils tokenizers
88f0db5c
cpmant
077e6f88
bartpho
e004b56b
test apply chat template assistant mask
e069763c
apply chat template video
9df9cfc5
apply chat template assistant mask
dc9b1aec
test torch
4c05e9df
update from slow in base and fix donut processor errors
5c209a40
auto to point to tokenizers backend, fix kosmos2
d8a8db8e
some non model fixes for old slow models that no longer have their ow…
6b40d915
missed file from last commit
976265bc
idefics2
b6ca8b25
fixup
5c721057
fixup
964b461b
pretrained tokenizer fast test update
03814073
Merge branch 'main' of github.com:huggingface/transformers into one_t…
887b4776
stash
f4c46ab5
Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…
efbbb043
bad merged
71ef2822
cherry pick more stuff that did not merge well
a5b018c8
fix gptsw3
8ea91f65
nit warn for now
19478948
update error raising
20a06ffe
just ran fixup
aa197a04
bring back bert legacy
63c7c1c2
fix
5895bab5
nit
6b8217b6
fix 56 errors on blenderbotsmall?
184ed581
18 for blenderbotsmall
09e4021f
tok auto
a8c299e7
missed clip
12590525
fix tests
06e3485a
something missed
3a95bf18
token healing
05d5c08c
tok common tests update - nonmodel
78f4e586
try to fix non-model test in test_tokenization_utils
8fbaf836
fix hub tests
fd40b1ba
try to fix hub tests
70330b85
custom vocab related fixed
7c780070
bert jap
ca1f6b09
BERT JAP
dd3ae59a
rename bert legacy to bert legacy
2e1893f7
Wav2vec2
f4be6a90
fix in tok python to update total vocab size - fixes speech t5
919103ac
blender bot small
c452f924
forgot test file
6d167eb9
test failures
025722be
marian
7d1d0d33
gpt2 tiktoken
dfb67a42
big bird / marian
51da6b28
udop
c611058e
forgot couple changes
cc4a9721
test_serve fix
51202daa
missing import
ca988b90
a couple processors fixes
f5bc69ef
Merge branch 'main' of github.com:huggingface/transformers into one_t…
c67de105
style partly
045bbffa
fix to fetch tests ci
75662fd4
Revert branch back to commit f5bc69ef state
8d248a39
revert branch to styling
4c299246
update mistral after merge
189cabd5
fixes for non model tests
e02741c5
some processor test fixes
b828ae16
more processor test fixes
83b579cf
more processor fixes
2ce27bcd
hub tests
881b97cf
python tok utils
2e28b3da
fix hub test
925d1873
Merge branch 'main' of github.com:huggingface/transformers into one_t…
66242316
make style for now
437321b8
remove problemattic fic copies
cd4d3ac9
python utils/check_copies.py --fix_and_overwrite
5c5864f5
more styling
2f13c132
fixup
1e1aa11c
silence docstirng
5eeb1fed
fix import?
dea8e1ef
fix imports
452d6d88
add the local test as well
e6502059
throw spm error
3dd17161
llamas
e700dfa7
fix a couple tests
ce23d672
broke ci
ff1bf368
broke ci
0bdfeae1
broke ci
a1376493
broke ci
366597c9
add logs to debug gemma on ci
22887b1c
gemma and llama
73819f44
gemma
c24c9970
revert las commit
551a959b
gemma debug
a18e84dc
gemma debug
c23ee139
gemma
93187b3e
safely import spiece backend
81428ef7
tok tests
eb95c2e8
check none
24d89c4c
setup and qual
e2c44345
ruff
7a737b77
del dev files
a19c90c1
tok auto
18e74845
fill docstrings
3cdd8ee8
clean vocab typing
25bd5a8b
update auto
50756c49
blenderbot small nit
6bccb46c
Merge branch 'main' of github.com:huggingface/transformers into one_t…
a76015ab
add migration guide
4afb5706
move mixtral patch to `TokenizersBackend`, move `TokenizerExtractor`
be1d95a1
rename MistralCommonTokenizer to MistralCommonB ackend
fad31d7c
Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…
d4aff20f
nit
3ab4becd
Merge branch 'main' of github.com:huggingface/transformers into one_t…
0c1a40a5
fix failures
30f16402
fixup
f2a14826
remoove one old test
d8010f85
mark the slow one as slow
82e56759
very small fixes
088fc39a
update auto mapping for missing ones
f677ddf7
fixup lorsd
d30e46b7
fixup doc and stuff
ad24f43c
should be the final fixe
ebfe7f19
processing update
c4a743d2
Merge branch 'main' of github.com:huggingface/transformers into one_t…
f81a9668
update
9a5638dd
FIX or brute AI fix the llava test
7c32dfbb
style
c520a66a
slow?
718b2f03
Merge branch 'main' of github.com:huggingface/transformers into one_t…
20d9036e
fix is offline mode?
8f536c2e
ArthurZucker
changed the base branch from
main
to
one_tokenizer
189 days ago
Merge branch 'one_tokenizer' into one_tok_typing
4650fd24
model_type
4337eaf0
Base automatically changed from
one_tokenizer
to
main
189 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub