transformers
use `TokenizersBackend`
#42894
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
61
Changes
View On
GitHub
use `TokenizersBackend`
#42894
ArthurZucker
merged 61 commits into
main
from
fix-tokenizer-auto
itazap
commented on 2025-12-18
itazap
force pushed
from
6a942ded
to
b71b2457
25 days ago
us `TokenizersBackend`
d26fd7c3
fixes
bbf72ad0
pioritize mapping
fb5a8ace
pioritize mapping
d412f4c4
only use mapping for some models
4c2d7b82
fix fallback
9434defa
undo debug thing
79ce0a58
add case to tokenizersbackend init
9cfb636a
add default bos eos token to tok backend
de02e4af
set bos eos
8b33c756
fix more models
68737883
mistrla idefics
4afaea9f
fix stopping criteria test
ec7e88aa
fix stopping criteria test
498f00e8
try stopping criteria fix
08cf3471
rebase
a31bb4fe
itazap
force pushed
from
273d2cbf
to
a31bb4fe
9 days ago
update tokenizer model for stopping criteria test
305f4937
fix tuple mapping for ministral
d53fa278
Merge branch 'main' into fix-tokenizer-auto
f3ce355f
itazap
marked this pull request as ready for review
8 days ago
itazap
marked this pull request as draft
8 days ago
ArthurZucker
marked this pull request as ready for review
7 days ago
ignore `tokenizer_class` as it is always wrong
5e30b8f3
Merge branch 'main' into fix-tokenizer-auto
f5e8296d
up
98363523
Merge branch 'fix-tokenizer-auto' of github.com:huggingface/transform…
4729ceff
try to fix idefics
e2337032
fix unispeech and maybe other: fallback if conversion was not possibl…
be8bb5e7
nits
4ec54f5a
fixup
e8ea6b7e
TIL that it was ALSO saved in config.json...
94c81422
arf
d38f86f7
fallback to tok config if no config json
c596ebee
people who map to Llama probably don't even want llama either..
32425238
Merge branch 'fix-tokenizer-auto' of github.com:huggingface/transform…
939cd416
processors to load tokbackend
8dee8651
auto fix order
dc92b342
try diff order
18600e41
mistral fix for weird chars
9306e4d6
reorder
8d34cf0d
random fix attempt for failing tests that are failing locally so idk …
9101e432
trying an older commit
35e48553
itazap
force pushed
from
92d4b4d9
to
35e48553
7 days ago
fix mistral
46d029b9
map unispeech
4d7b2b35
try something out
62723432
update
a2d6e03c
nits
b953a573
trying to be a little bit more restrictive
00d51f96
token type ids for tokenizers should be explicits... let's see which …
c17c7ae0
Nit
0ec73a35
idefics 1-2 are actually the only ones that should map to llama force
e732680c
small fixes
c2725f0c
fix layout
8b8f1668
fixup
15bf8c0a
fix some tests
d2dbfa62
1 nit
b601ae61
aria fix
24a94495
style
6ef783fc
canine
88d62cc8
fixup
ea537d83
very small test
f979d6cb
style
50c20f13
Merge branch 'main' of github.com:huggingface/transformers into fix-t…
618dadc6
update to tokenizersbackend
0354e33e
itazap
approved these changes on 2026-01-07
ArthurZucker
merged
9daee2e8
into main
6 days ago
ArthurZucker
deleted the fix-tokenizer-auto branch
6 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
itazap
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub