Nice. With the breaking change coming, such a script is crucial so many people can keep using their models!
@TheBloke Would something like this actually be useful for you? It still requires rewriting the whole file, but should be a lot faster than converting from HF or .pth
format and then quantizing.
One deficiency currently is that it has to try to revert the vocabulary mangling stuff the initial conversion to GGML performed and it doesn't seem possible to do this 100% correctly. However, I could potentially add a way to load the vocab from the original model metadata (tokenizer.model
, tokenizer_config.json
, config.json
) and use that rather than the vocab in the GGML file.
You should add a warning that models converted without a a new copy of the needed vocab parts may not be fully functional in the future. There is on going work done on the tokenizer in llama.cpp and there could be issues later on without the additional data.
If you load the vocab from the original model during conversion, you could compare the model with a real gguf model using sha256sum to verify that your conversion script works.
You should add a warning that models converted without a a new copy of the needed vocab parts may not be fully functional in the future.
There's already pretty big warning every time it runs:
=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===
Is that not enough?
I'm also not sure what you mean about "needed vocab parts". I don't think parts are necessarily missing, it's just special meta information like which tokens are "unknown" or whatever isn't really possible to recover.
If you load the vocab from the original model during conversion, you could compare the model with a real gguf model using sha256sum to verify that your conversion script works.
Maybe that approach could work for just the vocab part (but there's probably an easier way).
I've been looking at the existing conversion scripts and it's not exactly clear what exactly the correct approach here is. For example, convert.py
doesn't even add token types. convert-llama-hf-to-gguf.py
does. Also kind of annoying how the latter just has everything at the top level so you can't import and reuse functions.
Anyway, assuming I did implement loading from tokenizer.json
or whatever to build a new copy of the vocab rather than using what was in the GGML file I'd just reuse existing code so the vocab part section at least would be exactly the same as the other conversion scripts. There's no reason to write a custom version of that.
There's already pretty big warning every time it runs:
Ok good.
I've been looking at the existing conversion scripts and it's not exactly clear what exactly the correct approach here is. For example,
convert.py
doesn't even add token types.convert-llama-hf-to-gguf.py
does.
To test things out we made the simpler convert-llama-hf-to-gguf.py
scripts first, since the convert.py
is so complex. The latter is not fully finished yet and work is being done here #2668.
Beside a full copy of the vocab and scores we should have token types and the special token mapping for eos/bos etc.
Beside a full copy of the vocab and scores we should have token types and the special token mapping for eos/bos etc.
Seems reasonable, and once convert.py
is updated I can just use that to load the vocab and my GGUF output for vocab at least should be exactly the same as the official conversion.
I don't want to go too crazy with the amount of work I put into this when so far there's on indication that it's a candidate to get merged. Also, converting from GGML but also requiring the HF metadata seems like it would be kind of a niche use and no one's said "I'd actually use this feature!" yet.
seems like it would be kind of a niche use
not everyone has endless high speed internet access :)
not everyone has endless high speed internet access :)
So you're saying you need and would use this feature? If so, I'll look into adding it.
Probably need to wait until the convert.py
vocab stuff stabilizes, hopefully that will happen at a point that gives me enough time to update this.
Login to write a write a comment.
Currently in a pretty reasonable state. Testing/feedback would be appreciated.
Converted file tested to parse these prompts to the same tokens as pre-GGUF llama.cpp:
你喜欢小狗吗?
Once upon a time, in a dark forest, there lived a little fox
I also tested these models with the second prompt:
openorca-platypus2-13b.ggmlv3.q5_K_M.bin
gplatty-30b-superhot-8k.ggmlv3.q4_K_M.bin
platypus2-70b-instruct.ggmlv3.q4_K_M.bin
Identical generation compared to loading the actual GGML file with pre-GGUF llama.cpp when specifying a seed.
Note: When testing, be sure to specify
--eps
and--gqa
as is appropriate. You'll probably also want to specify--context-length
(it defaults to2048
).edit: It's now possible to use HF or "original" format metadata like vocab when converting. Some information about this and the current state of the pull: #2682 (comment)
Some perplexity results here: #2682 (comment)