PR #2682 Add script to convert GGMLv3 LLaMA models to GGUF

KerfuffleV21 year ago (edited 1 year ago)👍 6🎉 2❤ 2

Currently in a pretty reasonable state. Testing/feedback would be appreciated.

Converted file tested to parse these prompts to the same tokens as pre-GGUF llama.cpp:

你喜欢小狗吗？
Once upon a time, in a dark forest, there lived a little fox

I also tested these models with the second prompt:

Random LLaMA1 7B
openorca-platypus2-13b.ggmlv3.q5_K_M.bin
gplatty-30b-superhot-8k.ggmlv3.q4_K_M.bin
platypus2-70b-instruct.ggmlv3.q4_K_M.bin

Identical generation compared to loading the actual GGML file with pre-GGUF llama.cpp when specifying a seed.

Note: When testing, be sure to specify --eps and --gqa as is appropriate. You'll probably also want to specify --context-length (it defaults to 2048).

edit: It's now possible to use HF or "original" format metadata like vocab when converting. Some information about this and the current state of the pull: #2682 (comment)

Some perplexity results here: #2682 (comment)

Dampfinchen1 year ago

Nice. With the breaking change coming, such a script is crucial so many people can keep using their models!

KerfuffleV2 marked this pull request as ready for review 1 year ago

KerfuffleV21 year ago👀 2

@TheBloke Would something like this actually be useful for you? It still requires rewriting the whole file, but should be a lot faster than converting from HF or .pth format and then quantizing.

One deficiency currently is that it has to try to revert the vocabulary mangling stuff the initial conversion to GGML performed and it doesn't seem possible to do this 100% correctly. However, I could potentially add a way to load the vocab from the original model metadata (tokenizer.model, tokenizer_config.json, config.json) and use that rather than the vocab in the GGML file.

klosax1 year ago

You should add a warning that models converted without a a new copy of the needed vocab parts may not be fully functional in the future. There is on going work done on the tokenizer in llama.cpp and there could be issues later on without the additional data.

klosax1 year ago

If you load the vocab from the original model during conversion, you could compare the model with a real gguf model using sha256sum to verify that your conversion script works.

KerfuffleV21 year ago

You should add a warning that models converted without a a new copy of the needed vocab parts may not be fully functional in the future.

There's already pretty big warning every time it runs:

=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===

Is that not enough?

I'm also not sure what you mean about "needed vocab parts". I don't think parts are necessarily missing, it's just special meta information like which tokens are "unknown" or whatever isn't really possible to recover.

If you load the vocab from the original model during conversion, you could compare the model with a real gguf model using sha256sum to verify that your conversion script works.

Maybe that approach could work for just the vocab part (but there's probably an easier way).

I've been looking at the existing conversion scripts and it's not exactly clear what exactly the correct approach here is. For example, convert.py doesn't even add token types. convert-llama-hf-to-gguf.py does. Also kind of annoying how the latter just has everything at the top level so you can't import and reuse functions.

Anyway, assuming I did implement loading from tokenizer.json or whatever to build a new copy of the vocab rather than using what was in the GGML file I'd just reuse existing code so the vocab part section at least would be exactly the same as the other conversion scripts. There's no reason to write a custom version of that.

klosax1 year ago

There's already pretty big warning every time it runs:

Ok good.

I've been looking at the existing conversion scripts and it's not exactly clear what exactly the correct approach here is. For example, convert.py doesn't even add token types. convert-llama-hf-to-gguf.py does.

To test things out we made the simpler convert-llama-hf-to-gguf.py scripts first, since the convert.py is so complex. The latter is not fully finished yet and work is being done here #2668.

Beside a full copy of the vocab and scores we should have token types and the special token mapping for eos/bos etc.

Green-Sky commented on 2023-08-20

Conversation is marked as resolved

Show resolved

KerfuffleV21 year ago

Beside a full copy of the vocab and scores we should have token types and the special token mapping for eos/bos etc.

Seems reasonable, and once convert.py is updated I can just use that to load the vocab and my GGUF output for vocab at least should be exactly the same as the official conversion.

I don't want to go too crazy with the amount of work I put into this when so far there's on indication that it's a candidate to get merged. Also, converting from GGML but also requiring the HF metadata seems like it would be kind of a niche use and no one's said "I'd actually use this feature!" yet.

Green-Sky1 year ago👍 1

seems like it would be kind of a niche use

not everyone has endless high speed internet access :)

klosax commented on 2023-08-20

Conversation is marked as resolved

Show resolved

KerfuffleV21 year ago👍 1

@Green-Sky

not everyone has endless high speed internet access :)

So you're saying you need and would use this feature? If so, I'll look into adding it.

Probably need to wait until the convert.py vocab stuff stabilizes, hopefully that will happen at a point that gives me enough time to update this.

First pass at converting GGMLv3 LLaMA models to GGUF

8afc1ef3

Cleanups, better output during conversion

f7e61fd1

Fix vocab space conversion logic

08959c88

More vocab conversion fixes

8083e20d

Add description to converted GGUF files

ff251343

Improve help text, expand warning

80912f07

Allow specifying name and description for output GGUF

f56db216

Allow overriding vocab and hyperparams from original model metadata

e854cd7d

Use correct params override var name

996aaca1

Fix wrong type size for Q8_K

f68aef54

KerfuffleV2 force pushed from 297cce33 to f68aef54 1 year ago

ggerganov approved these changes on 2023-08-21

ggerganov1 year ago🎉 2

When you are ready, merge this to gguf
I'll merge gguf to master in an hour or two. Alternatively, you can change the target branch to master and merge it after #2398

Set default value for gguf add_tensor raw_shape KW arg

05477604

ggerganov merged e06cbcee into gguf 1 year ago

KerfuffleV2 deleted the feat-convert-ggml-to-gguf branch 1 year ago

	226		parser.add_argument('--input', '-i', help = 'Input GGMLv3 filename')
	227		parser.add_argument('--output', '-o', help ='Output GGUF filename')
	228		parser.add_argument('--gqa', type = int, default = 1, help = 'grouped-query attention factor (use 8 for LLaMA2 70B)')
	229		parser.add_argument('--eps', default = '5.0e-06', help = 'RMS norm eps (use 1e-5 for LLaMA2)')

llama.cpp
Add script to convert GGMLv3 LLaMA models to GGUF
#2682

Merged

Add script to convert GGMLv3 LLaMA models to GGUF #2682

llama.cpp Add script to convert GGMLv3 LLaMA models to GGUF #2682 Merged

Add script to convert GGMLv3 LLaMA models to GGUF #2682

llama.cpp
Add script to convert GGMLv3 LLaMA models to GGUF
#2682

Merged