llama.cpp
support loading vocab from fast tokenizer config in convert.py
#3633

Merged

support loading vocab from fast tokenizer config in convert.py #3633

ggerganov merged 32 commits into ggml-org:master from strutive07:convert_hf_vocab

Add HFVocab into convert.py

f7e377d6

Update convert.py

b0e00cb8

Update convert.py

f888d2ea

add bytes_to_unicode function

ea9f35f0

change add_meta_vocab fucntion

c7b636e9

remove debug code

6ec856b3

remove byte_encoder

1f16e5f2

Add newline between classes

e876aec1

Check tokenizer.json when tokenizer.model is not exist.

17784508

Move transformers dependency to local code

a5b26b66

cebtenzzre commented on 2023-10-18

Add error context with 'raise from'

5a1f1780

ggerganov approved these changes on 2023-10-18

cebtenzzre changed the title ~~Support huggingface tokenizer without tokenizer.model in convert.py~~ support loading vocab from fast tokenizer config in convert.py 2 years ago

cebtenzzre requested changes on 2023-10-22

Add fast tokenizer option to BpeVocab

89611cb0

Merge branch 'master' into convert_hf_vocab

97f690ab

Update convert.py

e7154423

Add VocabLoader and remove *Vocab class

d54764d0

Add transformers dependency

e19b7803

cebtenzzre commented on 2023-11-04

remove added tokens and check newline token to decide spm or bpe

28f09beb

Update convert.py

4adb8b98

Add special token type

13f07013

Update convert.py

f37a7d70

Update convert.py

9f4dc236

Update convert.py

dcf372e6

apepkuss commented on 2023-11-15

Fix typo in convert.py

cc1f3fcf

Fix when params.n_vocab < tokenizer vocab size

026eb7cd

update vocab class

2e263ca2

seungduk-yanolja commented on 2023-11-19

cebtenzzre commented on 2023-11-21

change funtion name

5ac1949f

Merge branch 'master' into convert_hf_vocab

74d80a88

Remove unused variable/functions, add types to class variable and met…

61edd1bc

fix flake8 warnings

1f5357cb

ggerganov requested a review from

cebtenzzre 2 years ago

code style cleanup

8fabb013

make mypy happy

c3b1c12f

cebtenzzre commented on 2023-12-13

change exception

35e95b62

cebtenzzre approved these changes on 2023-12-13

ggerganov merged 873637af into master 2 years ago

Ivanmatthew commented on 2024-04-04

Reviewers

ggerganov

cebtenzzre

teleprint-me

Ivanmatthew

seungduk-yanolja

apepkuss

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

llama.cpp support loading vocab from fast tokenizer config in convert.py #3633 Merged

support loading vocab from fast tokenizer config in convert.py #3633

llama.cpp
support loading vocab from fast tokenizer config in convert.py
#3633

Merged