llama.cpp
Tokenizer WPM fixes for bert-bge and jina-v2-en
#7500
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
5
Changes
View On
GitHub
Tokenizer WPM fixes for bert-bge and jina-v2-en
#7500
jaime-m-p
merged 5 commits into
ggml-org:master
from
jaime-m-p:tokenizer-wpm-fixes
github-actions
added
testing
github-actions
added
python
mofosyne
added
Review Complexity : Medium
Update random test: add_bos_token
2a38e5fa
Add WPM models for testing
af45703f
Build vocab.special_tokens_cache using vocab token types
938cb494
Fix and improve preprocessing
117b0910
Discard all tokens when no matching found
f3f6c0a9
jaime-m-p
force pushed
from
e92c3f89
to
f3f6c0a9
1 year ago
ggerganov
approved these changes on 2024-05-28
jaime-m-p
merged
02c1ecad
into master
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
Assignees
No one assigned
Labels
testing
python
Review Complexity : Medium
Milestone
No milestone
Login to write a write a comment.
Login via GitHub