Work on the BPE tokenizer #3252
Work on the BPE tokenizer
bfaab6f4
Try to fix build problem
89e74c67
Fix debug assertion failure
77704232
Fix MSVC Unicode BOM problem
37cf135c
Cleanup and an improvement
91a527a0
Fix compiler warning
208d3d7c
Cleanup
407f76d9
Test doesn't work over the full range of Unicodes
311fcf11
Update .gitignore and Makefile
c85cb29b
Another Makefile rule
048e659d
Testing Aquila
c0990bb7
Moving byte decoding back to `token_to_piece` ...
1b7c3692
Guarding some unusable code pathes
a4e9448e
Streamlining code and adding some more assertions
17ca8327
Adding a comment
4abbfb51
Adding another assertion
59a30b76
Fixed vocabulary guarding assertions
a6070b7c
Merge branch 'master' into falcon-tokenizer
16c06fe2
Fix PR for recent change
c09330ed
Fix PR for recent change
9cfb7145
Fix for compiler warning
607e3bff
Fix PR for recent change
fad8a773
Fix PR for recent change
6a16c36b
Fix PR for recent change
a2ddaad5
Fix for compiler warning
3fa8c555
Fixes for more compiler warnings
d6d7d0f0
cebtenzzre
marked this pull request as draft 2 years ago
Remove unused code
37af613d
Fix initialization of static maps
2117e23f
Add scores and token types back, adapt gptneox
28778f8a
ggerganov
approved these changes
on 2023-10-02
Update llama.cpp
a9a2af93
Update unicode.h
dccd1db4
Update unicode.h
02b9ccfd
Ported Starcoder and added some assertions
3d162cc8
goerch
marked this pull request as ready for review 2 years ago
Fix coding style
5aee498d
Apply @jploski 's fix for missing tokens
3e518e25
goerch
merged
ff5a3f0c
into master 2 years ago
staviq
commented
on 2023-10-04
goerch
deleted the falcon-tokenizer branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub