llama.cpp
Work on the BPE tokenizer
#3252
Merged

Work on the BPE tokenizer #3252

goerch merged 35 commits into ggml-org:master from falcon-tokenizer
goerch
goerch Work on the BPE tokenizer
bfaab6f4
cebtenzzre
goerch Try to fix build problem
89e74c67
goerch Fix debug assertion failure
77704232
staviq
goerch Fix MSVC Unicode BOM problem
37cf135c
Green-Sky
goerch Cleanup and an improvement
91a527a0
goerch Fix compiler warning
208d3d7c
goerch Cleanup
407f76d9
goerch Test doesn't work over the full range of Unicodes
311fcf11
staviq
goerch Update .gitignore and Makefile
c85cb29b
goerch Another Makefile rule
048e659d
KerfuffleV2
goerch
KerfuffleV2
goerch
KerfuffleV2
goerch
KerfuffleV2
goerch Testing Aquila
c0990bb7
goerch
goerch
KerfuffleV2
goerch Moving byte decoding back to `token_to_piece` ...
1b7c3692
goerch
goerch Guarding some unusable code pathes
a4e9448e
goerch
KerfuffleV2
goerch
goerch Streamlining code and adding some more assertions
17ca8327
goerch Adding a comment
4abbfb51
KerfuffleV2
goerch
KerfuffleV2
goerch
KerfuffleV2
goerch
goerch Adding another assertion
59a30b76
slaren
goerch
KerfuffleV2
goerch Fixed vocabulary guarding assertions
a6070b7c
KerfuffleV2
KerfuffleV2 approved these changes on 2023-09-19
goerch
ggerganov
goerch
KerfuffleV2
ggerganov
cebtenzzre
cebtenzzre commented on 2023-09-27
cebtenzzre
cebtenzzre commented on 2023-09-27
goerch
goerch Merge branch 'master' into falcon-tokenizer
16c06fe2
goerch Fix PR for recent change
c09330ed
goerch Fix PR for recent change
9cfb7145
goerch Fix for compiler warning
607e3bff
goerch Fix PR for recent change
fad8a773
goerch Fix PR for recent change
6a16c36b
goerch Fix PR for recent change
a2ddaad5
goerch Fix for compiler warning
3fa8c555
goerch Fixes for more compiler warnings
d6d7d0f0
apage43
ggerganov ggerganov added high priority
goerch
goerch
cebtenzzre
cebtenzzre commented on 2023-09-30
cebtenzzre cebtenzzre marked this pull request as draft 2 years ago
cebtenzzre
goerch Remove unused code
37af613d
goerch
cebtenzzre
goerch Fix initialization of static maps
2117e23f
goerch
cebtenzzre
goerch Add scores and token types back, adapt gptneox
28778f8a
goerch
ggerganov
ggerganov approved these changes on 2023-10-02
ggerganov
ggerganov commented on 2023-10-02
goerch Update llama.cpp
a9a2af93
goerch Update unicode.h
dccd1db4
goerch Update unicode.h
02b9ccfd
goerch Ported Starcoder and added some assertions
3d162cc8
goerch goerch marked this pull request as ready for review 2 years ago
goerch Fix coding style
5aee498d
goerch
goerch Apply @jploski 's fix for missing tokens
3e518e25
goerch goerch merged ff5a3f0c into master 2 years ago
cebtenzzre
goerch
cebtenzzre
goerch
cebtenzzre
goerch
cebtenzzre
goerch
cebtenzzre
staviq
staviq commented on 2023-10-04
goerch
cebtenzzre
maddes8cht
cebtenzzre
maddes8cht
goerch goerch deleted the falcon-tokenizer branch 2 years ago
maddes8cht

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone