PR #42415 initial clean

rm slow

d7af5a54

rm protobuf dependency

73be8c48

create_fast_tokenizer file

87cfea8a

move update post processor and add bos eos properties

d5e56bbd

llama

dc0611f7

simplify test

f2022400

handle blank tok

cacf09e8

save tests

26e08874

rm old common tests

e4b29559

llama refactored test - mixin temporary

ba3a0a46

add qwen2

117ce1dc

rm slow qwen2 tok

42d4e798

qwen2

7fb3d772

rm call_one and batch_encode_plus

21433e18

rm prepare_for_model

db8923c2

cohere

19138cbe

gemma

ec13e398

split up tests and remove common ones that shoudl not be run for each…

6c25f26f

load PreTrainedSentencePieceTokenizer fallback

193684d9

rm functions dedicated for batched input

d411492d

spiece tests

3ee35253

cut base down

e0a260d5

cleaned up base to be more more abstract for other backends to implement

82653f78

speed up added tokens

14d2a8ca

revert _pad

4980a2fd

rm specialtokenmixin and stale functions

19c9b098

rm pickle tests

a9263d1d

fixes missed

5fe5666c

gemma test fix

51e62e1f

refactor

0e5dbdf4

rm legacy from llama

9136d3c8

added renaming

ab77f57b

add _model

36bc3ef6

update legacy

c4f045c4

update legacy

c80dd1db

fix docstring

790c0923

always load blank, then set _tokenizer if we have it

f4d956a2

new toks

b2c320c2

update all berttokenizer based models

0c3caff0

apply feedback - delete bert duplicates

d43412a3

more models --> fast only

48eeb50c

more convert_slow models

d3a3cbd6

fix common test refs

493f9e0b

updating fast only tokenizers

a51cea01

openai and pegasus

d9c1ec33

enable sentencepiecebackend

d879bc3e

more models

ca510297

code gen

132c617e

t5

ed5bf863

code gen tests

158b4448

speecht5

64eaf880

mbart

95f48d3f

mbart50

f3248d2c

more models

f3dd1030

more models

c66037d9

layouglmv2

cb5e08b5

update tests

31590335

update tests

a14a45d3

update tests

7ca10f8b

pretrainedtokenizer

f5cbc494

whisper

72e8043f

whisper

3cd8e5b4

layoutxlm and storing backends

4bf2b85a

refactor sentencepiecebackend and additional_special_tokens

2ef0fd37

renaming tokenization_utils --> tokenization_python

5c7d347f

udpate tests

fcf67ff8

bert test

a8ccf164

blenderbot

ccca98e4

clip

c118c106

codegen

0f740815

code_llama

a11dba71

cohere

b678cde8

deberata, deberat v2, funnel

ea9a5465

gpt2

ffbdecf8

batch update tests

9f08ade7

pegasus qwen2 roberta

a7cd5c08

more models

b5b3cd98

layout tests

1250bcc8

some renaming

cf72cae4

fix references to utils_fast

4fafdcc2

fix refs

236f9f18

fix refs

cd743bfd

fix refs

0e7e5939

fix refs

2af6d2cb

fix refs

b58b7b1e

fix refs

518dcaf6

fix refs

0f2f4b6a

fix some tests

c8491486

regression

0d54bbd6

fix refs

81a140a5

fix refs

61366d6a

missed the most crucial file in my last commit

4374a66e

fix refs

df383d75

fix refs

b8035eca

fix refs

37e1b925

batch encode fix

9b45774d

fix some tests

a24856d8

BC for batch_decode bc too many refs

18688703

more tests

35dd2509

fix more tests

b0428f3b

fix for processors

8fe6873a

fixing more models

c1e0e461

deleted mbart50 by accident

79568cdd

seamless m4t

cfa159a3

albert fix

5854f4c8

whisper

714a856e

layout3

c016f114

attempt to fix cached tokenizers on CI

2e3e1780

trying another fix on CI

03e3ab9f

again try to work around CI

2c30d79a

bertweet

98f51d55

tapas

96f0517c

mbart50

c26f54b8

luke

da0bbf0c

mluke

494ef3e3

markuplm

39bb8847

markuplm

960dfcf3

fix some more auto tests

54992a07

some random model failures

d0383bdb

mistralcommontestser

a969c6b6

more fixes

2bf4a13c

ref fix

e88322fb

siglip

cfb0100a

marian

0fd10662

plbart

02c524c2

update utils toks

820191e6

seamless m4t

0cd714d0

roc bert

8a412bc7

udpate byt5 test

e8c32585

xlm

85a3b1f6

esm

45e718f7

roformer

96fc4675

code llama

7727e3b5

biogpt

6795515d

m2m100

2f49a392

dpr and flaubert

a42e7a81

xlm and speech to text

33634bef

tok backend pass object

ca5e3891

tokenizer object pass

25021d4d

wav2vec2

69610fec

wav2vec2

51799caf

cpmant

f23abc3e

update utils tokenizers

88f0db5c

cpmant

077e6f88

bartpho

e004b56b

test apply chat template assistant mask

e069763c

apply chat template video

9df9cfc5

apply chat template assistant mask

dc9b1aec

test torch

4c05e9df

update from slow in base and fix donut processor errors

5c209a40

auto to point to tokenizers backend, fix kosmos2

d8a8db8e

some non model fixes for old slow models that no longer have their ow…

6b40d915

missed file from last commit

976265bc

idefics2

b6ca8b25

fixup

5c721057

fixup

964b461b

pretrained tokenizer fast test update

03814073

Merge branch 'main' of github.com:huggingface/transformers into one_t…

887b4776

stash

f4c46ab5

Merge branch 'one_tokenizer' of github.com:huggingface/transformers i…

efbbb043

bad merged

71ef2822

cherry pick more stuff that did not merge well

a5b018c8

fix gptsw3

8ea91f65

nit warn for now

19478948

update error raising

20a06ffe

just ran fixup

aa197a04

bring back bert legacy

63c7c1c2

fix

5895bab5

nit

6b8217b6

fix 56 errors on blenderbotsmall?

184ed581

18 for blenderbotsmall

09e4021f

tok auto

a8c299e7

missed clip

12590525

fix tests

06e3485a

something missed

3a95bf18

token healing

05d5c08c

tok common tests update - nonmodel

78f4e586

try to fix non-model test in test_tokenization_utils

8fbaf836

fix hub tests

fd40b1ba

try to fix hub tests

70330b85

custom vocab related fixed

7c780070

bert jap

ca1f6b09

BERT JAP

dd3ae59a

rename bert legacy to bert legacy

2e1893f7

Wav2vec2

f4be6a90

fix in tok python to update total vocab size - fixes speech t5

919103ac

blender bot small

c452f924

forgot test file

6d167eb9

test failures

025722be

marian

7d1d0d33

gpt2 tiktoken

dfb67a42

big bird / marian

51da6b28

udop

c611058e

forgot couple changes

cc4a9721

test_serve fix

51202daa

missing import

ca988b90

a couple processors fixes

f5bc69ef

Merge branch 'main' of github.com:huggingface/transformers into one_t…

c67de105

style partly

045bbffa

fix to fetch tests ci

75662fd4

Revert branch back to commit f5bc69ef state

8d248a39

revert branch to styling

4c299246

update mistral after merge

189cabd5

fixes for non model tests

e02741c5

some processor test fixes

b828ae16

more processor test fixes

83b579cf

more processor fixes

2ce27bcd

hub tests

881b97cf

python tok utils

2e28b3da

fix hub test

925d1873

initial clean

ab1df03f

Base automatically changed from one_tokenizer to main 78 days ago

transformers
initial clean
#42415

Open

initial clean #42415

transformers initial clean #42415 Open

initial clean #42415

transformers
initial clean
#42415

Open