transformers.js
Fix BPE tokenization for weird whitespace characters (Closes #199)
#208
Merged

Fix BPE tokenization for weird whitespace characters (Closes #199) #208

xenova merged 6 commits into main from bpe-freeze-fix
xenova
xenova Add new tokenizer unit test (#199)
5d482218
xenova Perform `NFKC` normalization for sentencepiece models w/ precompiled …
c9e1eeaf
xenova Fix JSDoc indentation
c55833d0
xenova Add problematic string to unit tests
a0ec3c29
xenova Use consistent BPE split token
1a9f9de8
xenova Add second problematic string
e24a9fc2
HuggingFaceDocBuilderDev
fozziethebeat
xenova xenova merged 1165f04a into main 2 years ago
xenova xenova deleted the bpe-freeze-fix branch 2 years ago
xenova

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone