Fix BPE tokenization for weird whitespace characters (Closes #199) #208
Add new tokenizer unit test (#199)
5d482218
Perform `NFKC` normalization for sentencepiece models w/ precompiled …
c9e1eeaf
Fix JSDoc indentation
c55833d0
Add problematic string to unit tests
a0ec3c29
Use consistent BPE split token
1a9f9de8
Add second problematic string
e24a9fc2
xenova
merged
1165f04a
into main 2 years ago
xenova
deleted the bpe-freeze-fix branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub