llama.cpp
gguf-py : support lazy tensor splitting
#12809
Merged

gguf-py : support lazy tensor splitting #12809

ngxson merged 2 commits into master from compilade/lazy-tuples
compilade
compilade133 days ago (edited 133 days ago)👍 1

Splitting usually involves returning tuples of tensors, which need to be handled properly to avoid early eager evaluation.

As explained in #12791 (comment), this will likely help reducing the RAM usage when converting Llama4, since the approach in #12791 uses torch.split on the FFN projections.

TODO:

  • Test conversion with Llama4 and make sure this helps with RAM usage and the output is the same
    • @bartowski1182 or @ddh0 if you have the hashes for a previous conversion of the slow-to-convert model(s) that would be helpful (although the hash may depend on the directory name of the source directory since the metadata of the model potentially includes part of it)

Make sure to read the contributing guidelines before submitting a PR

compilade gguf-py : support lazy tensor splitting
6cbbd8e1
compilade compilade added python
compilade compilade requested a review from ngxson ngxson 133 days ago
compilade gguf-py : fix flake8 lint
da140da7
bartowski1182
bartowski1182133 days ago

My sha256sum is 56a723c60b94a95a5814c1ac6d5382b3011cb9931763e20f6f14aec264348bf2

I may be able to pull your changes and see if it's different, but from looking at previously uploaded conversions it doesn't look like any folder metadata gets in there, and I don't add any of my own so should match up

bartowski1182
bartowski1182133 days ago❤ 2

sha256 of conversion with this change: 56a723c60b94a95a5814c1ac6d5382b3011cb9931763e20f6f14aec264348bf2

so it matches, woo!

The conversion wasn't WAY faster, still took well over an hour, I think about 1:30, but still faster than before which was over 1:45 🤷

ngxson
ngxson approved these changes on 2025-04-08
ngxson ngxson merged a226bc7a into master 133 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone