support loading vocab from fast tokenizer config in convert.py #3633
Add HFVocab into convert.py
f7e377d6
Update convert.py
b0e00cb8
Update convert.py
f888d2ea
add bytes_to_unicode function
ea9f35f0
change add_meta_vocab fucntion
c7b636e9
remove debug code
6ec856b3
remove byte_encoder
1f16e5f2
Add newline between classes
e876aec1
Check tokenizer.json when tokenizer.model is not exist.
17784508
Move transformers dependency to local code
a5b26b66
Add error context with 'raise from'
5a1f1780
ggerganov
approved these changes
on 2023-10-18
cebtenzzre
changed the title Support huggingface tokenizer without tokenizer.model in convert.py support loading vocab from fast tokenizer config in convert.py 1 year ago
Add fast tokenizer option to BpeVocab
89611cb0
Merge branch 'master' into convert_hf_vocab
97f690ab
Update convert.py
e7154423
Add VocabLoader and remove *Vocab class
d54764d0
Add transformers dependency
e19b7803
remove added tokens and check newline token to decide spm or bpe
28f09beb
Update convert.py
4adb8b98
Add special token type
13f07013
Update convert.py
f37a7d70
Update convert.py
9f4dc236
Update convert.py
dcf372e6
Fix typo in convert.py
cc1f3fcf
Fix when params.n_vocab < tokenizer vocab size
026eb7cd
update vocab class
2e263ca2
change funtion name
5ac1949f
Merge branch 'master' into convert_hf_vocab
74d80a88
Remove unused variable/functions, add types to class variable and met…
61edd1bc
fix flake8 warnings
1f5357cb
code style cleanup
8fabb013
make mypy happy
c3b1c12f
change exception
35e95b62
ggerganov
merged
873637af
into master 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub