transformers
🔴🔴🔴 fix: skip `clean_up_tokenization` for BPE tokenizers in `PreTrainedTokenizerFast`
#44915
Merged

🔴🔴🔴 fix: skip `clean_up_tokenization` for BPE tokenizers in `PreTrainedTokenizerFast` #44915

maxsloef-goodfire
maxsloef-goodfire fix: skip clean_up_tokenization for BPE tokenizers
578eddd4
maxsloef-goodfire test: add test for BPE tokenizer skipping clean_up_tokenization
2993578e
maxsloef-goodfire fix: update tests to expect BPE cleanup skip
9c1a6f1a
maxsloef-goodfire fix: move BPE test to correct class, use clean roundtrip text
0cce53ab
maxsloef-goodfire fix: add leading space to test string for ByteLevel BPE prefix
247ee6ff
ArthurZucker
ArthurZucker commented on 2026-03-23
maxsloef-goodfire feat: add escape hatch for BPE cleanup override
ddca57ec
maxsloef-goodfire maxsloef-goodfire force pushed from d11ef365 to ddca57ec 60 days ago
maxsloef-goodfire
maxsloef-goodfire maxsloef-goodfire requested a review from ArthurZucker ArthurZucker 43 days ago
maxsloef-goodfire Merge branch 'main' into fix/skip-cleanup-for-bpe
3f0769d3
maxsloef-goodfire Merge branch 'main' into fix/skip-cleanup-for-bpe
ddaa381b
maxsloef-goodfire
maxsloef-goodfire Merge branch 'main' into fix/skip-cleanup-for-bpe
91c9946f
maxsloef-goodfire
ArthurZucker
ArthurZucker approved these changes on 2026-04-22
HuggingFaceDocBuilderDev
itazap
github-actions
github-actions
maxsloef-goodfire maxsloef-goodfire changed the title fix: skip `clean_up_tokenization` for BPE tokenizers in `PreTrainedTokenizerFast` 🔴🔴🔴 fix: skip `clean_up_tokenization` for BPE tokenizers in `PreTrainedTokenizerFast` 30 days ago
maxsloef-goodfire
itazap
maxsloef-goodfire test: add llama 3 regression test for BPE clean_up_tokenization_space…
277ba510
maxsloef-goodfire
github-actions
itazap
itazap itazap merged bbb51c83 into main 25 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone