llama.cpp
llama : fix pre-tokenization of non-special added tokens
#8228
Merged

llama : fix pre-tokenization of non-special added tokens #8228

compilade merged 14 commits into master from compilade/fix-mpt-pretok
compilade
compilade llama : fix mpt and olmo pre-tokenizer
db2ffd51
compilade compilade added bugfix
compilade compilade added Review Complexity : Low
compilade compilade requested a review from jaime-m-p jaime-m-p 1 year ago
jaime-m-p
jaime-m-p
jaime-m-p commented on 2024-07-01
compilade
compilade
compilade compilade marked this pull request as draft 1 year ago
jaime-m-p
compilade compilade closed this 1 year ago
compilade Merge branch 'master' into compilade/fix-mpt-pretok
ac0f33c9
compilade llama : pre-tokenize non-special user-defined tokens first
d5d30b20
compilade Merge branch 'master' into compilade/fix-mpt-pretok
6b961e3d
compilade compilade reopened this 1 year ago
github-actions github-actions added testing
compilade llama : fix detection of control-like user-defined tokens
56df1fcd
compilade convert_hf : identify which user-defined tokens are control tokens
6e351e04
github-actions github-actions added python
compilade convert_hf : identify more added control tokens for SPM tokenziers
f9d42c59
compilade compilade changed the title llama : fix mpt and olmo pre-tokenizer llama : fix pre-tokenization of non-special added tokens 1 year ago
compilade
compilade compilade removed Review Complexity : Low
compilade compilade added Review Complexity : Medium
compilade compilade added generation quality
compilade compilade marked this pull request as ready for review 1 year ago
compilade
compilade commented on 2024-07-08
bartowski1182
bartowski1182 commented on 2024-07-08
compilade llama : fix Viking pre-tokenizer regex
31a1b0ee
compilade llama : fix command-r detokenization
d6fe269c
jaime-m-p
jaime-m-p commented on 2024-07-08
jaime-m-p
jaime-m-p
jaime-m-p commented on 2024-07-09
jaime-m-p
compilade convert_hf : reduce usages of the UNKNOWN token type
d4df7858
compilade llama : add UNKNOWN tokens in the special tokens cache
98edea60
oldgithubman
compilade
oldgithubman
compilade
oldgithubman
compilade
oldgithubman
compilade
oldgithubman
oldgithubman
compilade Merge branch 'master' into compilade/fix-mpt-pretok
afa61198
compilade convert_hf : reduce usages of UNKNOWN for InternLM2
1caa20fc
oldgithubman
oldgithubman requested changes on 2024-07-11
ggerganov
ggerganov approved these changes on 2024-07-12
ggerganov ggerganov requested a review from jaime-m-p jaime-m-p 1 year ago
compilade test-tokenizer-random : reduce potential confilcts with #8379
59ce8531
compilade compilade added merge ready
compilade compilade merged fa79495b into master 1 year ago
jaime-m-p
compilade

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone