Thanks for notifying me here @compilade. Not sure how working on two "competing" PRs at once works - should I merge this branch into #6942 and add support, or can we merge that first and write GGUFSplitWriter
support into this PR? #6942 is ready to merge whenever you give the green light, and I think I've resolved everything from your last review, so I'd lean towards the latter. The changes would be minimal anyway, just changing the override signatures slightly.
I think I already resolved most of the integration/split management nicely-ish in GGUFSplitWriter
(and don't really want to integrate splitting directly into GGUFWriter
for a couple reasons, primarily that GGUFWriter
should only write a single file), and unfortunately this PR doesn't allow for the last ideal change - only adding KV data/tensors in their respective write_to_file
functions rather than in an init_shards
setup function - because both the self.tensors
list and self.kv_data
dict must be fully populated by the call to write_header_to_file
for the correct ti/kv data counts to be written (and there's no good way around this anyway).
That being said, as a general thing, I like these changes. add_key
being essentially an alias for a particular call to add_value
had confused me, and re: tempfile, good change - most users will never encounter nor need this. One thing you didn't note that I might add is that delaying the naming of the GGUFWriter
outfile also delays the file creation - which has some small benefits, e.g. not creating or overwriting a previous output file with a blank file if the script is stopped before it actually needs to write anything.
Also, not very familiar with the inner workings of the GGUF specification, but does it matter which order the metadata is included? In theory, you could have two identical model GGUFs that only differ in metadata ordering, where they would appear different by SHA256 checksum but are, well, identical. Is that a possibility worth concern?
Thanks for notifying me here @compilade. Not sure how working on two "competing" PRs at once works - should I merge this branch into #6942 and add support, or can we merge that first and write
GGUFSplitWriter
support into this PR?
@christianazinn I use git worktree
in a bunch of folders :)
I'm fine with either PR being merged into master
first. (merges across PRs can otherwise be weird unless well prepared)
#6942 is ready to merge whenever you give the green light, and I think I've resolved everything from your last review, so I'd lean towards the latter.
Yeah, I'll test #6942 tomorrow (it's night for me right now) and will likely approve after that. But I think it would be cleaner to refactor first to make the feature simpler, although it's not strictly necessary (the simplifications can come after too).
I think I already resolved most of the integration/split management nicely-ish in
GGUFSplitWriter
(and don't really want to integrate splitting directly intoGGUFWriter
for a couple reasons, primarily thatGGUFWriter
should only write a single file)
There's also another way to see GGUFWriter
, that it writes a single model. But I agree it's likely cleaner to keep it to a single GGUFWriter
per output file. We'll see.
Also, not very familiar with the inner workings of the GGUF specification, but does it matter which order the metadata is included?
The order of the metadata doesn't really matter, and it's currently governed by the insertion order from convert-hf-to-gguf.py
. It was like this before this PR, and this behavior is kept in this PR too.
In theory, you could have two identical model GGUFs that only differ in metadata ordering, where they would appear different by SHA256 checksum but are, well, identical. Is that a possibility worth concern?
Hmm, you're right that this could sometimes be problematic, like when adding (for example) tokenizer.ggml.pre
to a model which did not have it, with gguf-new-metadata.py
. Might be worth it eventually to sort the keys, but it's out of scope for this PR.
I like this PR moving towards decoupling the writing and the metadata. Was wishing for a way to not have to keep making new functions for each metadata entry
Did some further tests with 32d11db,
(using https://huggingface.co/jtatman/TinyMistral-248m-v2.5-4x-Moe)
I've tested --vocab-only
to make sure it still works
$ sha256sum tinymistral-moe-*defer*
316c22ee97cd3717736ff7cd4ee6cabfdb811b389bc23c4d7cf071c0b514144b tinymistral-moe-lazy-defer.q8_0.gguf
ff2a7bec22e7691ab4c1b9533731f20a5a44218791a9d7d7225e977f22ad085d tinymistral-moe-lazy-defer-vocab.q8_0.gguf
$ sha256sum tinymistral-moe-master-vocab.q8_0.gguf
ff2a7bec22e7691ab4c1b9533731f20a5a44218791a9d7d7225e977f22ad085d tinymistral-moe-master-vocab.q8_0.gguf
(the checksum with all tensors matches the one in this PR's main post)
I've also tested to see whether convert-legacy-llama.py
still works, and it does!
$ sha256sum tinymistral-moe-*legacy*
a877bbc20528db4d7c9986629cc0edbe5abe7751da454b34e163cd9313210b64 tinymistral-moe-legacy.f16.gguf
aeb30a6b3e560cb0cb24ac2793906fda57e0d6688bafdf628d21af52cbecae59 tinymistral-moe-legacy-vocab.f16.gguf
a877bbc20528db4d7c9986629cc0edbe5abe7751da454b34e163cd9313210b64 tinymistral-moe-master-legacy.f16.gguf
aeb30a6b3e560cb0cb24ac2793906fda57e0d6688bafdf628d21af52cbecae59 tinymistral-moe-master-legacy-vocab.f16.gguf
sanity checking before merging. Looks alright
Login to write a write a comment.
This is an attempt to simplify both #7499 and #6942 by slightly changing how
GGUFWriter
works internally.@mofosyne, @christianazinn, I think this might be interesting for you.
Summary of changes
GGUFWriter.add_key
andGGUFWriter.add_val
have been replaced byGGUFWriter.add_key_value
.add_key
should never have been used alone anyway; this was error-prone.gguf-py/scripts/gguf-new-metadata.py
used this, and was easy enough to adapt.use_temp_file
is now opt-in instead of opt-out. (it now defaults toFalse
)gguf-new-metadata.py
which uses the defaultuse_temp_file
value but which already uses mmaped Numpy tensors from GGUFReader, so a temporary file is redundant when it's there by surprise.GGUFWriter
doesn't really need the output file name until when actually writing to itGGUFWriter
doesn't really need to eagerly prepare the data layout of the metadatatensors
andkv_data
dicts of aGGUFWriter
instance before writing to a file.dict
because they map tensor names to tensor info, and keys to values, respectively, and duplicates of either tensor names and keys are detected and should not happen anyway. Since Python 3.7, the iteration order ofdict
is guaranteed to be the same as the insertion order (andgguf-py
already depends on Python>=3.8
in itspyproject.toml
).GGUFWriter
.Testing
I already listed checksums of many models in #7234, so I'll be testing some of these same models to ensure nothing changed.
Since
gguf-new-metadata.py
was also changed, why not also test it:--general-description "Hello world"
, then removing it with--remove-metadata general.description