transformers
55cc1a7f - fix: set `clean_up_tokenization_spaces=False` in Llama 3 tokenizer conversion (#44914)

Commit
35 days ago
fix: set `clean_up_tokenization_spaces=False` in Llama 3 tokenizer conversion (#44914) fix: set clean_up_tokenization_spaces=False in Llama 3 tokenizer conversion The Llama3Converter hardcodes clean_up_tokenization_spaces=True, which applies BERT-era string replacements (` .` → `.`, ` !` → `!`, etc.) that silently corrupt decoded text from Llama 3's BPE tokenizer. Llama 2's LlamaTokenizer and Llama 4 both use False. The True was introduced in PR #30334 and hardcoded in PR #33778 for backward compat. Fixes #35175
Parents
Loading