llama.cpp
gguf-py : decouple adding metadata from writing in GGUFWriter
#7827
Merged

gguf-py : decouple adding metadata from writing in GGUFWriter #7827

compilade
compilade346 days ago❤ 3

This is an attempt to simplify both #7499 and #6942 by slightly changing how GGUFWriter works internally.

@mofosyne, @christianazinn, I think this might be interesting for you.

Summary of changes

  • The only breaking change is that GGUFWriter.add_key and GGUFWriter.add_val have been replaced by GGUFWriter.add_key_value.
    • add_key should never have been used alone anyway; this was error-prone.
    • Only gguf-py/scripts/gguf-new-metadata.py used this, and was easy enough to adapt.
  • use_temp_file is now opt-in instead of opt-out. (it now defaults to False)
    • It's not necessary in most cases, because Lazy tensors, and because it was otherwise unnecessarily enabled when not thinking about it, like in gguf-new-metadata.py which uses the default use_temp_file value but which already uses mmaped Numpy tensors from GGUFReader, so a temporary file is redundant when it's there by surprise.
  • GGUFWriter doesn't really need the output file name until when actually writing to it
    • It's now possible to choose the file name at any time before writing the file header, which includes the brief time when the metadata of all included tensors is available
      • This should simplify the parameter count in #7499
  • GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
    • It's possible to choose what to include and in what order, by modifying the tensors and kv_data dicts of a GGUFWriter instance before writing to a file.
      • These are now dict because they map tensor names to tensor info, and keys to values, respectively, and duplicates of either tensor names and keys are detected and should not happen anyway. Since Python 3.7, the iteration order of dict is guaranteed to be the same as the insertion order (and gguf-py already depends on Python >=3.8 in its pyproject.toml).
      • This should help #6942 with integrating splitting in GGUFWriter.

Testing

I already listed checksums of many models in #7234, so I'll be testing some of these same models to ensure nothing changed.

  • Tinyllama https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0:
    • $ sha256sum tinyllama-lazy-decouple.*
      5af5abc9122fd3cd113b5bdd828e0f1c1737a432e3de0b80cd403a32659f07a6  tinyllama-lazy-decouple.bf16.gguf
      06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-lazy-decouple.q8_0.gguf
      $ sha256sum tinyllama-*-tempfile.q8_0.gguf
      06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-eager-decouple-tempfile.q8_0.gguf
      06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-lazy-decouple-tempfile.q8_0.gguf    
      Matches the older PR.
  • TinyMistral MoE https://huggingface.co/jtatman/TinyMistral-248m-v2.5-4x-Moe
    • $ sha256sum tinymistral-moe-lazy-decouple.*
      d6e6e1977ffa9cc365d425c107d7a79770b9425cab9db04d695e092fedd00d72  tinymistral-moe-lazy-decouple.bf16.gguf
      316c22ee97cd3717736ff7cd4ee6cabfdb811b389bc23c4d7cf071c0b514144b  tinymistral-moe-lazy-decouple.q8_0.gguf
      Matches the older PR.

Since gguf-new-metadata.py was also changed, why not also test it:

  • Tinyllama: round-trip with --general-description "Hello world", then removing it with --remove-metadata general.description
    • $ sha256sum tinyllama-lazy-decouple-*
      44fda7dd804d827a0e03f1bffc3d734f41426a29fe5c4359d133423a8430b809  tinyllama-lazy-decouple-description.q8_0.gguf
      06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-lazy-decouple-no-description.q8_0.gguf
      Matches the above
compilade gguf-py : decouple adding metadata from writing in GGUFWriter
fe59f20d
github-actions github-actions added python
compilade compilade added refactoring
compilade compilade added Review Complexity : Medium
christianazinn
christianazinn346 days ago (edited 346 days ago)👍 2

Thanks for notifying me here @compilade. Not sure how working on two "competing" PRs at once works - should I merge this branch into #6942 and add support, or can we merge that first and write GGUFSplitWriter support into this PR? #6942 is ready to merge whenever you give the green light, and I think I've resolved everything from your last review, so I'd lean towards the latter. The changes would be minimal anyway, just changing the override signatures slightly.

I think I already resolved most of the integration/split management nicely-ish in GGUFSplitWriter (and don't really want to integrate splitting directly into GGUFWriter for a couple reasons, primarily that GGUFWriter should only write a single file), and unfortunately this PR doesn't allow for the last ideal change - only adding KV data/tensors in their respective write_to_file functions rather than in an init_shards setup function - because both the self.tensors list and self.kv_data dict must be fully populated by the call to write_header_to_file for the correct ti/kv data counts to be written (and there's no good way around this anyway).

That being said, as a general thing, I like these changes. add_key being essentially an alias for a particular call to add_value had confused me, and re: tempfile, good change - most users will never encounter nor need this. One thing you didn't note that I might add is that delaying the naming of the GGUFWriter outfile also delays the file creation - which has some small benefits, e.g. not creating or overwriting a previous output file with a blank file if the script is stopped before it actually needs to write anything.

Also, not very familiar with the inner workings of the GGUF specification, but does it matter which order the metadata is included? In theory, you could have two identical model GGUFs that only differ in metadata ordering, where they would appear different by SHA256 checksum but are, well, identical. Is that a possibility worth concern?

compilade
compilade346 days ago👀 1

Thanks for notifying me here @compilade. Not sure how working on two "competing" PRs at once works - should I merge this branch into #6942 and add support, or can we merge that first and write GGUFSplitWriter support into this PR?

@christianazinn I use git worktree in a bunch of folders :)

I'm fine with either PR being merged into master first. (merges across PRs can otherwise be weird unless well prepared)

#6942 is ready to merge whenever you give the green light, and I think I've resolved everything from your last review, so I'd lean towards the latter.

Yeah, I'll test #6942 tomorrow (it's night for me right now) and will likely approve after that. But I think it would be cleaner to refactor first to make the feature simpler, although it's not strictly necessary (the simplifications can come after too).

I think I already resolved most of the integration/split management nicely-ish in GGUFSplitWriter (and don't really want to integrate splitting directly into GGUFWriter for a couple reasons, primarily that GGUFWriter should only write a single file)

There's also another way to see GGUFWriter, that it writes a single model. But I agree it's likely cleaner to keep it to a single GGUFWriter per output file. We'll see.

Also, not very familiar with the inner workings of the GGUF specification, but does it matter which order the metadata is included?

The order of the metadata doesn't really matter, and it's currently governed by the insertion order from convert-hf-to-gguf.py. It was like this before this PR, and this behavior is kept in this PR too.

In theory, you could have two identical model GGUFs that only differ in metadata ordering, where they would appear different by SHA256 checksum but are, well, identical. Is that a possibility worth concern?

Hmm, you're right that this could sometimes be problematic, like when adding (for example) tokenizer.ggml.pre to a model which did not have it, with gguf-new-metadata.py. Might be worth it eventually to sort the keys, but it's out of scope for this PR.

mofosyne
mofosyne346 days ago

I like this PR moving towards decoupling the writing and the metadata. Was wishing for a way to not have to keep making new functions for each metadata entry

ggerganov
ggerganov approved these changes on 2024-06-08
compilade compilade added merge ready
compilade compilade removed merge ready
compilade gguf-py : always defer GGUFWrite output file opening
32d11dbb
compilade
compilade345 days ago

Did some further tests with 32d11db,

(using https://huggingface.co/jtatman/TinyMistral-248m-v2.5-4x-Moe)

I've tested --vocab-only to make sure it still works

$ sha256sum tinymistral-moe-*defer*
316c22ee97cd3717736ff7cd4ee6cabfdb811b389bc23c4d7cf071c0b514144b  tinymistral-moe-lazy-defer.q8_0.gguf
ff2a7bec22e7691ab4c1b9533731f20a5a44218791a9d7d7225e977f22ad085d  tinymistral-moe-lazy-defer-vocab.q8_0.gguf
$ sha256sum tinymistral-moe-master-vocab.q8_0.gguf 
ff2a7bec22e7691ab4c1b9533731f20a5a44218791a9d7d7225e977f22ad085d  tinymistral-moe-master-vocab.q8_0.gguf

(the checksum with all tensors matches the one in this PR's main post)

I've also tested to see whether convert-legacy-llama.py still works, and it does!

$ sha256sum tinymistral-moe-*legacy*
a877bbc20528db4d7c9986629cc0edbe5abe7751da454b34e163cd9313210b64  tinymistral-moe-legacy.f16.gguf
aeb30a6b3e560cb0cb24ac2793906fda57e0d6688bafdf628d21af52cbecae59  tinymistral-moe-legacy-vocab.f16.gguf
a877bbc20528db4d7c9986629cc0edbe5abe7751da454b34e163cd9313210b64  tinymistral-moe-master-legacy.f16.gguf
aeb30a6b3e560cb0cb24ac2793906fda57e0d6688bafdf628d21af52cbecae59  tinymistral-moe-master-legacy-vocab.f16.gguf
compilade compilade added merge ready
mofosyne
mofosyne approved these changes on 2024-06-09
mofosyne345 days ago

sanity checking before merging. Looks alright

mofosyne mofosyne merged ed9f2521 into master 345 days ago
mofosyne mofosyne deleted the compilade/gguf-py-decouple-writer-meta branch 345 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone