transformers
use `TokenizersBackend`
#42894
Merged

use `TokenizersBackend` #42894

ArthurZucker merged 61 commits into main from fix-tokenizer-auto
ArthurZucker
HuggingFaceDocBuilderDev
itazap
itazap commented on 2025-12-18
itazap itazap force pushed from 6a942ded to b71b2457 25 days ago
ArthurZucker us `TokenizersBackend`
d26fd7c3
ArthurZucker fixes
bbf72ad0
pioritize mapping
fb5a8ace
pioritize mapping
d412f4c4
only use mapping for some models
4c2d7b82
fix fallback
9434defa
undo debug thing
79ce0a58
add case to tokenizersbackend init
9cfb636a
add default bos eos token to tok backend
de02e4af
set bos eos
8b33c756
fix more models
68737883
mistrla idefics
4afaea9f
fix stopping criteria test
ec7e88aa
fix stopping criteria test
498f00e8
try stopping criteria fix
08cf3471
itazap rebase
a31bb4fe
itazap itazap force pushed from 273d2cbf to a31bb4fe 9 days ago
itazap update tokenizer model for stopping criteria test
305f4937
itazap fix tuple mapping for ministral
d53fa278
itazap Merge branch 'main' into fix-tokenizer-auto
f3ce355f
itazap itazap marked this pull request as ready for review 8 days ago
itazap itazap marked this pull request as draft 8 days ago
ArthurZucker ArthurZucker marked this pull request as ready for review 7 days ago
ArthurZucker ignore `tokenizer_class` as it is always wrong
5e30b8f3
ArthurZucker Merge branch 'main' into fix-tokenizer-auto
f5e8296d
ArthurZucker up
98363523
ArthurZucker Merge branch 'fix-tokenizer-auto' of github.com:huggingface/transform…
4729ceff
ArthurZucker try to fix idefics
e2337032
ArthurZucker fix unispeech and maybe other: fallback if conversion was not possibl…
be8bb5e7
ArthurZucker nits
4ec54f5a
ArthurZucker fixup
e8ea6b7e
ArthurZucker TIL that it was ALSO saved in config.json...
94c81422
ArthurZucker arf
d38f86f7
itazap fallback to tok config if no config json
c596ebee
ArthurZucker people who map to Llama probably don't even want llama either..
32425238
ArthurZucker Merge branch 'fix-tokenizer-auto' of github.com:huggingface/transform…
939cd416
itazap processors to load tokbackend
8dee8651
itazap auto fix order
dc92b342
itazap try diff order
18600e41
mistral fix for weird chars
9306e4d6
itazap reorder
8d34cf0d
random fix attempt for failing tests that are failing locally so idk …
9101e432
trying an older commit
35e48553
itazap itazap force pushed from 92d4b4d9 to 35e48553 7 days ago
fix mistral
46d029b9
map unispeech
4d7b2b35
ArthurZucker try something out
62723432
ArthurZucker update
a2d6e03c
ArthurZucker nits
b953a573
ArthurZucker trying to be a little bit more restrictive
00d51f96
ArthurZucker token type ids for tokenizers should be explicits... let's see which …
c17c7ae0
ArthurZucker Nit
0ec73a35
ArthurZucker idefics 1-2 are actually the only ones that should map to llama force
e732680c
ArthurZucker small fixes
c2725f0c
ArthurZucker fix layout
8b8f1668
ArthurZucker fixup
15bf8c0a
ArthurZucker fix some tests
d2dbfa62
ArthurZucker 1 nit
b601ae61
ArthurZucker aria fix
24a94495
ArthurZucker style
6ef783fc
ArthurZucker canine
88d62cc8
ArthurZucker fixup
ea537d83
ArthurZucker very small test
f979d6cb
ArthurZucker style
50c20f13
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into fix-t…
618dadc6
update to tokenizersbackend
0354e33e
github-actions
github-actions
itazap
itazap approved these changes on 2026-01-07
ArthurZucker ArthurZucker merged 9daee2e8 into main 6 days ago
ArthurZucker ArthurZucker deleted the fix-tokenizer-auto branch 6 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone