llama.cpp
259a3ca9 - peg-parser: add unicode-aware trie for GBNF exclusion patterns

Commit

202 days ago

peg-parser: add unicode-aware trie for GBNF exclusion patterns Fixes GBNF grammar generation for templates with Unicode tokens. DeepSeek R1/V3 use fullwidth vertical line (U+FF5C) and lower one eighth block (U+2581) in their special tokens. The byte-based trie was generating invalid UTF-8 prefixes in exclusion patterns. Changes: - Add unicode_trie that works with code points instead of bytes - Update gbnf_escape_codepoint to handle non-ASCII characters - Add tests for DeepSeek-style Unicode token formats Enables: deepseek_r1:experimental, deepseek_v3_1:experimental 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

References

#18353 - [WIP] tool-call: experimental migration of all parsers to peg-parser infra (w/ better test coverage)

Author

ochafik

Committer

ochafik

Parents

64ee5406

llama.cpp 259a3ca9 - peg-parser: add unicode-aware trie for GBNF exclusion patterns

llama.cpp
259a3ca9 - peg-parser: add unicode-aware trie for GBNF exclusion patterns