llama.cpp
259a3ca9 - peg-parser: add unicode-aware trie for GBNF exclusion patterns

Commit
104 days ago
peg-parser: add unicode-aware trie for GBNF exclusion patterns Fixes GBNF grammar generation for templates with Unicode tokens. DeepSeek R1/V3 use fullwidth vertical line (U+FF5C) and lower one eighth block (U+2581) in their special tokens. The byte-based trie was generating invalid UTF-8 prefixes in exclusion patterns. Changes: - Add unicode_trie that works with code points instead of bytes - Update gbnf_escape_codepoint to handle non-ASCII characters - Add tests for DeepSeek-style Unicode token formats Enables: deepseek_r1:experimental, deepseek_v3_1:experimental 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Author
Committer
Parents
Loading