🔴🔴🔴 fix: skip `clean_up_tokenization` for BPE tokenizers in `PreTrainedTokenizerFast` #44915
fix: skip clean_up_tokenization for BPE tokenizers
578eddd4
test: add test for BPE tokenizer skipping clean_up_tokenization
2993578e
fix: update tests to expect BPE cleanup skip
9c1a6f1a
fix: move BPE test to correct class, use clean roundtrip text
0cce53ab
fix: add leading space to test string for ByteLevel BPE prefix
247ee6ff
feat: add escape hatch for BPE cleanup override
ddca57ec
Merge branch 'main' into fix/skip-cleanup-for-bpe
3f0769d3
Merge branch 'main' into fix/skip-cleanup-for-bpe
ddaa381b
Merge branch 'main' into fix/skip-cleanup-for-bpe
91c9946f
maxsloef-goodfire
changed the title fix: skip `clean_up_tokenization` for BPE tokenizers in `PreTrainedTokenizerFast` 🔴🔴🔴 fix: skip `clean_up_tokenization` for BPE tokenizers in `PreTrainedTokenizerFast` 30 days ago
test: add llama 3 regression test for BPE clean_up_tokenization_space…
277ba510
itazap
merged
bbb51c83
into main 25 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub