llama.cpp
llama : more tokenizer fixes
#2810
Merged

llama : more tokenizer fixes #2810

ggerganov merged 13 commits into master from fix-tokenizer
ggerganov
ggerganov tests : write a Python tokenizer test (wip)
5cad62bc
ggerganov llama : prefix input text for tokenization with whitespace
5d0ffb69
klosax
klosax commented on 2023-08-26
ggerganov llama : distinguish pieces from decoded text + fix detokenization
9668aa11
ggerganov ggerganov force pushed 2 years ago
ggerganov common : add comments
1e7a033f
ggerganov ggerganov force pushed to 1e7a033f 2 years ago
ggerganov examples : no longer manually add leading space when tokenizing
dfa058ef
ggerganov tests : use Python to generate tokenizer tests for C++
70005bd5
ggerganov tests : add option to tokenize text files
e4324cbd
ggerganov ggerganov force pushed to e4324cbd 2 years ago
ggerganov ggerganov marked this pull request as ready for review 2 years ago
ggerganov ggerganov requested a review from SlyEcho SlyEcho 2 years ago
ggerganov ggerganov requested a review from ikawrakow ikawrakow 2 years ago
ggerganov ggerganov requested a review from klosax klosax 2 years ago
SlyEcho
SlyEcho approved these changes on 2023-08-26
ghost
ggerganov
klosax
ghost
ggerganov
ggerganov
ggerganov commented on 2023-08-26
klosax
ghost
klosax
klosax approved these changes on 2023-08-26
ggerganov tests : add test-tokenizer-1.py
eb8b3264
ggerganov Merge branch 'master' into fix-tokenizer
c7677463
klosax llama.cpp : fix LF token
ab3ba64f
ggerganov hellaswag : move the concat space for clarity
dbcf470b
ggerganov ggerganov force pushed to dbcf470b 2 years ago
ikawrakow
ikawrakow approved these changes on 2023-08-27
ggerganov
ggerganov tests : add falcon tests (py + cpp, currently do not pass Unicode)
3bb0f849
ggerganov ggerganov force pushed to 3bb0f849 2 years ago
ikawrakow
ggerganov
ggerganov common : temporary separate llama_detokenize calls for SPM and BPE
841983fe
ggerganov
ggerganov ggerganov merged edd4c148 into master 2 years ago
ggerganov ggerganov deleted the fix-tokenizer branch 2 years ago
ikawrakow
ghost
OthmanProgramming
OthmanProgramming
shibe2
goerch
goerch
shibe2

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone