llama.cpp
Add script to convert GGMLv3 LLaMA models to GGUF
#2682
Merged

Add script to convert GGMLv3 LLaMA models to GGUF #2682

KerfuffleV2
KerfuffleV21 year ago (edited 1 year ago)👍 6🎉 2❤ 2

Currently in a pretty reasonable state. Testing/feedback would be appreciated.

Converted file tested to parse these prompts to the same tokens as pre-GGUF llama.cpp:

  1. 你喜欢小狗吗?
  2. Once upon a time, in a dark forest, there lived a little fox

I also tested these models with the second prompt:

  1. Random LLaMA1 7B
  2. openorca-platypus2-13b.ggmlv3.q5_K_M.bin
  3. gplatty-30b-superhot-8k.ggmlv3.q4_K_M.bin
  4. platypus2-70b-instruct.ggmlv3.q4_K_M.bin

Identical generation compared to loading the actual GGML file with pre-GGUF llama.cpp when specifying a seed.

Note: When testing, be sure to specify --eps and --gqa as is appropriate. You'll probably also want to specify --context-length (it defaults to 2048).

edit: It's now possible to use HF or "original" format metadata like vocab when converting. Some information about this and the current state of the pull: #2682 (comment)

Some perplexity results here: #2682 (comment)

Dampfinchen
Dampfinchen1 year ago

Nice. With the breaking change coming, such a script is crucial so many people can keep using their models!

KerfuffleV2 KerfuffleV2 marked this pull request as ready for review 1 year ago
KerfuffleV2
KerfuffleV21 year ago👀 2

@TheBloke Would something like this actually be useful for you? It still requires rewriting the whole file, but should be a lot faster than converting from HF or .pth format and then quantizing.

One deficiency currently is that it has to try to revert the vocabulary mangling stuff the initial conversion to GGML performed and it doesn't seem possible to do this 100% correctly. However, I could potentially add a way to load the vocab from the original model metadata (tokenizer.model, tokenizer_config.json, config.json) and use that rather than the vocab in the GGML file.

klosax
klosax1 year ago

You should add a warning that models converted without a a new copy of the needed vocab parts may not be fully functional in the future. There is on going work done on the tokenizer in llama.cpp and there could be issues later on without the additional data.

klosax
klosax1 year ago

If you load the vocab from the original model during conversion, you could compare the model with a real gguf model using sha256sum to verify that your conversion script works.

KerfuffleV2
KerfuffleV21 year ago

You should add a warning that models converted without a a new copy of the needed vocab parts may not be fully functional in the future.

There's already pretty big warning every time it runs:

=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===

Is that not enough?

I'm also not sure what you mean about "needed vocab parts". I don't think parts are necessarily missing, it's just special meta information like which tokens are "unknown" or whatever isn't really possible to recover.

If you load the vocab from the original model during conversion, you could compare the model with a real gguf model using sha256sum to verify that your conversion script works.

Maybe that approach could work for just the vocab part (but there's probably an easier way).

I've been looking at the existing conversion scripts and it's not exactly clear what exactly the correct approach here is. For example, convert.py doesn't even add token types. convert-llama-hf-to-gguf.py does. Also kind of annoying how the latter just has everything at the top level so you can't import and reuse functions.

Anyway, assuming I did implement loading from tokenizer.json or whatever to build a new copy of the vocab rather than using what was in the GGML file I'd just reuse existing code so the vocab part section at least would be exactly the same as the other conversion scripts. There's no reason to write a custom version of that.

klosax
klosax1 year ago

There's already pretty big warning every time it runs:

Ok good.

I've been looking at the existing conversion scripts and it's not exactly clear what exactly the correct approach here is. For example, convert.py doesn't even add token types. convert-llama-hf-to-gguf.py does.

To test things out we made the simpler convert-llama-hf-to-gguf.py scripts first, since the convert.py is so complex. The latter is not fully finished yet and work is being done here #2668.

Beside a full copy of the vocab and scores we should have token types and the special token mapping for eos/bos etc.

Green-Sky
Green-Sky commented on 2023-08-20
Conversation is marked as resolved
Show resolved
convert-llama-ggmlv3-to-gguf.py
Green-Sky1 year ago👍 1

maybe add a gguf_writer.add_description('converted from old ggjtv3 file.')
or something.

KerfuffleV21 year ago

Seems very reasonable.

KerfuffleV2
KerfuffleV21 year ago

Beside a full copy of the vocab and scores we should have token types and the special token mapping for eos/bos etc.

Seems reasonable, and once convert.py is updated I can just use that to load the vocab and my GGUF output for vocab at least should be exactly the same as the official conversion.

I don't want to go too crazy with the amount of work I put into this when so far there's on indication that it's a candidate to get merged. Also, converting from GGML but also requiring the HF metadata seems like it would be kind of a niche use and no one's said "I'd actually use this feature!" yet.

Green-Sky
Green-Sky1 year ago👍 1

seems like it would be kind of a niche use

not everyone has endless high speed internet access :)

klosax
klosax commented on 2023-08-20
Conversation is marked as resolved
Show resolved
convert-llama-ggmlv3-to-gguf.py
226 parser.add_argument('--input', '-i', help = 'Input GGMLv3 filename')
227 parser.add_argument('--output', '-o', help ='Output GGUF filename')
228 parser.add_argument('--gqa', type = int, default = 1, help = 'grouped-query attention factor (use 8 for LLaMA2 70B)')
229
parser.add_argument('--eps', default = '5.0e-06', help = 'RMS norm eps (use 1e-5 for LLaMA2)')
klosax1 year ago👍 1

For LLaMA v1 / OpenLLaMA models the epsilon is 1e-6. The 5e-6 was a compromise of those two.

KerfuffleV21 year ago

I guess we'd still want to use that as the default value then but mention 1e-6 in the help?

KerfuffleV21 year ago

I updated the help for --eps to be more specific about the values for different model types.

KerfuffleV2
KerfuffleV21 year ago👍 1

@Green-Sky

not everyone has endless high speed internet access :)

So you're saying you need and would use this feature? If so, I'll look into adding it.

Probably need to wait until the convert.py vocab stuff stabilizes, hopefully that will happen at a point that gives me enough time to update this.

klosax
TheBloke
KerfuffleV2
KerfuffleV2
netrunnereve
klosax
KerfuffleV2
netrunnereve
KerfuffleV2
TheBloke
KerfuffleV2
klosax
Dampfinchen
KerfuffleV2
KerfuffleV2 First pass at converting GGMLv3 LLaMA models to GGUF
8afc1ef3
KerfuffleV2 Cleanups, better output during conversion
f7e61fd1
KerfuffleV2 Fix vocab space conversion logic
08959c88
KerfuffleV2 More vocab conversion fixes
8083e20d
KerfuffleV2 Add description to converted GGUF files
ff251343
KerfuffleV2 Improve help text, expand warning
80912f07
KerfuffleV2 Allow specifying name and description for output GGUF
f56db216
KerfuffleV2 Allow overriding vocab and hyperparams from original model metadata
e854cd7d
KerfuffleV2 Use correct params override var name
996aaca1
KerfuffleV2 Fix wrong type size for Q8_K
f68aef54
KerfuffleV2 KerfuffleV2 force pushed from 297cce33 to f68aef54 1 year ago
ggerganov
ggerganov approved these changes on 2023-08-21
ggerganov1 year ago🎉 2

When you are ready, merge this to gguf
I'll merge gguf to master in an hour or two. Alternatively, you can change the target branch to master and merge it after #2398

KerfuffleV2 Set default value for gguf add_tensor raw_shape KW arg
05477604
KerfuffleV2
ggerganov
ggerganov ggerganov merged e06cbcee into gguf 1 year ago
KerfuffleV2 KerfuffleV2 deleted the feat-convert-ggml-to-gguf branch 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone