llama.cpp
support loading vocab from fast tokenizer config in convert.py
#3633
Merged

support loading vocab from fast tokenizer config in convert.py #3633

strutive07
strutive07 Add HFVocab into convert.py
f7e377d6
strutive07 Update convert.py
b0e00cb8
strutive07 Update convert.py
f888d2ea
strutive07 add bytes_to_unicode function
ea9f35f0
strutive07 change add_meta_vocab fucntion
c7b636e9
strutive07 remove debug code
6ec856b3
strutive07 remove byte_encoder
1f16e5f2
strutive07 Add newline between classes
e876aec1
TheBloke
TheBloke
strutive07
strutive07 Check tokenizer.json when tokenizer.model is not exist.
17784508
strutive07
TheBloke
teleprint-me
ggerganov
cebtenzzre
teleprint-me
strutive07 Move transformers dependency to local code
a5b26b66
cebtenzzre
cebtenzzre commented on 2023-10-18
strutive07
strutive07 Add error context with 'raise from'
5a1f1780
ggerganov
ggerganov approved these changes on 2023-10-18
teleprint-me
cebtenzzre
teleprint-me
cebtenzzre
teleprint-me
cebtenzzre
teleprint-me
goerch
cebtenzzre cebtenzzre changed the title Support huggingface tokenizer without tokenizer.model in convert.py support loading vocab from fast tokenizer config in convert.py 1 year ago
cebtenzzre
goerch
cebtenzzre
cebtenzzre requested changes on 2023-10-22
teleprint-me
goerch
teleprint-me
strutive07 Add fast tokenizer option to BpeVocab
89611cb0
strutive07
goerch
jploski
TheBloke
strutive07 Merge branch 'master' into convert_hf_vocab
97f690ab
strutive07
strutive07 Update convert.py
e7154423
strutive07
strutive07 Add VocabLoader and remove *Vocab class
d54764d0
strutive07 Add transformers dependency
e19b7803
strutive07
jploski
TheBloke
TheBloke
cebtenzzre
TheBloke
TheBloke
TheBloke
cebtenzzre
TheBloke
TheBloke
cebtenzzre
cebtenzzre commented on 2023-11-04
cebtenzzre
TheBloke
strutive07 remove added tokens and check newline token to decide spm or bpe
28f09beb
TheBloke
strutive07 Update convert.py
4adb8b98
TheBloke
oneCodeScholar
strutive07 Add special token type
13f07013
strutive07
TheBloke
TheBloke
TheBloke
strutive07
TheBloke
strutive07
TheBloke
TheBloke
cebtenzzre
cebtenzzre
KerfuffleV2
strutive07
KerfuffleV2
Chainfire
TheBloke
TheBloke
ArthurZucker
strutive07 Update convert.py
f37a7d70
strutive07 Update convert.py
9f4dc236
strutive07 Update convert.py
dcf372e6
strutive07
apepkuss
apepkuss commented on 2023-11-15
strutive07 Fix typo in convert.py
cc1f3fcf
TheBloke
strutive07 Fix when params.n_vocab < tokenizer vocab size
026eb7cd
strutive07
TheBloke
TheBloke
jonastemplestein
TheBloke
strutive07 update vocab class
2e263ca2
strutive07
seungduk-yanolja
seungduk-yanolja commented on 2023-11-19
cebtenzzre
cebtenzzre commented on 2023-11-21
teleprint-me
cebtenzzre
teleprint-me
strutive07 change funtion name
5ac1949f
strutive07
strutive07 Merge branch 'master' into convert_hf_vocab
74d80a88
teleprint-me
strutive07 Remove unused variable/functions, add types to class variable and met…
61edd1bc
strutive07 fix flake8 warnings
1f5357cb
jonastemplestein
teleprint-me
TheBloke
TheBloke
strutive07
TheBloke
TheBloke
TheBloke
ArthurZucker
ggerganov
ggerganov ggerganov requested a review from cebtenzzre cebtenzzre 1 year ago
strutive07
TheBloke
ggerganov
cebtenzzre code style cleanup
8fabb013
cebtenzzre make mypy happy
c3b1c12f
teleprint-me
cebtenzzre
cebtenzzre commented on 2023-12-13
strutive07 change exception
35e95b62
cebtenzzre
cebtenzzre approved these changes on 2023-12-13
ggerganov ggerganov merged 873637af into master 1 year ago
ggerganov
Ivanmatthew
Ivanmatthew commented on 2024-04-04

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone