llama.cpp
Tokenizer WPM fixes for bert-bge and jina-v2-en
#7500
Merged

Tokenizer WPM fixes for bert-bge and jina-v2-en #7500

jaime-m-p
github-actions github-actions added testing
github-actions github-actions added python
mofosyne mofosyne added Review Complexity : Medium
jaime-m-p
teleprint-me
iamlemec
jaime-m-p
teleprint-me
jaime-m-p
teleprint-me
Update random test: add_bos_token
2a38e5fa
Add WPM models for testing
af45703f
Build vocab.special_tokens_cache using vocab token types
938cb494
Fix and improve preprocessing
117b0910
Discard all tokens when no matching found
f3f6c0a9
jaime-m-p jaime-m-p force pushed from e92c3f89 to f3f6c0a9 1 year ago
github-actions
ggerganov
ggerganov approved these changes on 2024-05-28
jaime-m-p jaime-m-p merged 02c1ecad into master 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone