transformers
GgufLinear: inference-time GGUF matmul on Apple Silicon — llama.cpp parity
#45977
Open

GgufLinear: inference-time GGUF matmul on Apple Silicon — llama.cpp parity #45977

ArthurZucker wants to merge 31 commits into main from gguf-matmul-kernels
ArthurZucker
HuggingFaceDocBuilderDev
ArthurZucker ArthurZucker force pushed from 56d3847a to cb6ba169 37 days ago
ArthurZucker Add GgufLinear: inference-time GGUF matmul on Apple Silicon
69d0f977
ArthurZucker ArthurZucker force pushed from 5134799c to 69d0f977 33 days ago
ArthurZucker ArthurZucker changed the base branch from update-gguf to main 33 days ago
ArthurZucker doc
d75a23bc
ArthurZucker GGUF cleanup — align with the FP8 quantizer pattern
5635106a
ArthurZucker GGUF: target-aware GGUFDequantize drops the dense-Linear byte-copy
a23cae8c
ArthurZucker GGUF: route MoE experts through the WeightConverter API too
bbb34dba
ArthurZucker GGUF: register Mixtral / DeepSeek-V3 in MODEL_TYPE_TO_GGUF_EXPERTS
d4f6d40e
ArthurZucker GGUF: GgufExperts matches MixtralExperts layout — merge converter jus…
1a820f9f
ArthurZucker GGUF cleanup pass: drop modeling_utils side-path + fix MoE config att…
daa4d78c
ArthurZucker gguf_kernels: drop snapshot_download fallback, use kernels.get_kernel…
6c08eda1
ArthurZucker MODEL_TYPE_TO_GGUF_EXPERTS: sync with the MoE entries in _GGUF_ARCH_C…
c09fd9ce
ArthurZucker MODEL_TYPE_TO_GGUF_EXPERTS: cover all MoE archs that quantize to GGUF
992843d2
ArthurZucker GGUF: own its experts interface, drop entries from base ExpertsInterface
307aaab9
ArthurZucker GGUF cleanup: drop bespoke helpers, mirror FP8 conventions tighter
c389d753
ArthurZucker GGUF: explicit kernel refs on each module, drop bind helpers
ee2eef2c
ArthurZucker GGUF: safetensors save round-trip via module_quant_types
a6c52296
ArthurZucker up
9efcab00
ArthurZucker cleanup
42c7e444
ArthurZucker up
2e067f1b
ArthurZucker updates
f56b072c
ArthurZucker type
64067535
ArthurZucker GGUF: per-arch rope/norm fixes + writable mmap + i-quant fallback
c43f20c6
ArthurZucker Merge remote-tracking branch 'origin/main' into gguf-matmul-kernels
bdcfef8b
ArthurZucker GGUF: lazy-import torch in quantizer_gguf to keep PIL-only CI happy
5ea30d9b
ArthurZucker Merge branch 'main' into gguf-matmul-kernels
18cd7219
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into gguf-…
57ba8842
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into gguf-…
c76d77b9
ArthurZucker GGUF: simplify quantizer plumbing and fix swap-plan rename bug
ccbcb1cf
ArthurZucker GGUF: fix MPS dequant byte-corruption and make norm de-offset data-dr…
8d87b3c4
github-actions
ArthurZucker GGUF: de-offset norms via keep-in-fp32 instead of pre-applying
b11c4d7a
ArthurZucker GGUF: expose header metadata without materializing tensors
e09ea879
ArthurZucker GGUF: uniform meta-time swap, no rename, bind kernels post-load
e0aae61b

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone