llama.cpp
llama : fix pre-tokenization of non-special added tokens
#8228
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
14
Changes
View On
GitHub
llama : fix pre-tokenization of non-special added tokens
#8228
compilade
merged 14 commits into
master
from
compilade/fix-mpt-pretok
llama : fix mpt and olmo pre-tokenizer
db2ffd51
compilade
added
bugfix
compilade
added
Review Complexity : Low
compilade
requested a review
from
jaime-m-p
1 year ago
jaime-m-p
commented on 2024-07-01
compilade
marked this pull request as draft
1 year ago
compilade
closed this
1 year ago
Merge branch 'master' into compilade/fix-mpt-pretok
ac0f33c9
llama : pre-tokenize non-special user-defined tokens first
d5d30b20
Merge branch 'master' into compilade/fix-mpt-pretok
6b961e3d
compilade
reopened this
1 year ago
github-actions
added
testing
llama : fix detection of control-like user-defined tokens
56df1fcd
convert_hf : identify which user-defined tokens are control tokens
6e351e04
github-actions
added
python
convert_hf : identify more added control tokens for SPM tokenziers
f9d42c59
compilade
changed the title
llama : fix mpt and olmo pre-tokenizer
llama : fix pre-tokenization of non-special added tokens
1 year ago
compilade
removed
Review Complexity : Low
compilade
added
Review Complexity : Medium
compilade
added
generation quality
compilade
marked this pull request as ready for review
1 year ago
compilade
commented on 2024-07-08
bartowski1182
commented on 2024-07-08
llama : fix Viking pre-tokenizer regex
31a1b0ee
llama : fix command-r detokenization
d6fe269c
jaime-m-p
commented on 2024-07-08
jaime-m-p
commented on 2024-07-09
convert_hf : reduce usages of the UNKNOWN token type
d4df7858
llama : add UNKNOWN tokens in the special tokens cache
98edea60
Merge branch 'master' into compilade/fix-mpt-pretok
afa61198
convert_hf : reduce usages of UNKNOWN for InternLM2
1caa20fc
oldgithubman
requested changes on 2024-07-11
ggerganov
approved these changes on 2024-07-12
ggerganov
requested a review
from
jaime-m-p
1 year ago
test-tokenizer-random : reduce potential confilcts with #8379
59ce8531
compilade
added
merge ready
compilade
merged
fa79495b
into master
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
oldgithubman
jaime-m-p
bartowski1182
Assignees
No one assigned
Labels
generation quality
testing
bugfix
python
Review Complexity : Medium
merge ready
Milestone
No milestone
Login to write a write a comment.
Login via GitHub