GgufLinear: inference-time GGUF matmul on Apple Silicon — llama.cpp parity #45977
ArthurZucker
force pushed
from
56d3847a
to
cb6ba169
37 days ago
Add GgufLinear: inference-time GGUF matmul on Apple Silicon
69d0f977
ArthurZucker
force pushed
from
5134799c
to
69d0f977
33 days ago
ArthurZucker
changed the base branch from
update-gguf
to
main
33 days ago
doc
d75a23bc
GGUF cleanup — align with the FP8 quantizer pattern
5635106a
GGUF: target-aware GGUFDequantize drops the dense-Linear byte-copy
a23cae8c
GGUF: route MoE experts through the WeightConverter API too
bbb34dba
GGUF: register Mixtral / DeepSeek-V3 in MODEL_TYPE_TO_GGUF_EXPERTS
d4f6d40e
GGUF: GgufExperts matches MixtralExperts layout — merge converter jus…
1a820f9f
GGUF cleanup pass: drop modeling_utils side-path + fix MoE config att…
daa4d78c
gguf_kernels: drop snapshot_download fallback, use kernels.get_kernel…
6c08eda1
MODEL_TYPE_TO_GGUF_EXPERTS: sync with the MoE entries in _GGUF_ARCH_C…
c09fd9ce
MODEL_TYPE_TO_GGUF_EXPERTS: cover all MoE archs that quantize to GGUF
992843d2
GGUF: own its experts interface, drop entries from base ExpertsInterface
307aaab9
GGUF cleanup: drop bespoke helpers, mirror FP8 conventions tighter
c389d753
GGUF: explicit kernel refs on each module, drop bind helpers
ee2eef2c
GGUF: safetensors save round-trip via module_quant_types
a6c52296
up
9efcab00
cleanup
42c7e444
up
2e067f1b
updates
f56b072c
type
64067535
GGUF: per-arch rope/norm fixes + writable mmap + i-quant fallback
c43f20c6
Merge remote-tracking branch 'origin/main' into gguf-matmul-kernels
bdcfef8b
GGUF: lazy-import torch in quantizer_gguf to keep PIL-only CI happy
5ea30d9b
Merge branch 'main' into gguf-matmul-kernels
18cd7219
Merge branch 'main' of github.com:huggingface/transformers into gguf-…
57ba8842
Merge branch 'main' of github.com:huggingface/transformers into gguf-…
c76d77b9
GGUF: simplify quantizer plumbing and fix swap-plan rename bug
ccbcb1cf
GGUF: fix MPS dequant byte-corruption and make norm de-offset data-dr…
8d87b3c4
GGUF: de-offset norms via keep-in-fp32 instead of pre-applying
b11c4d7a
GGUF: expose header metadata without materializing tensors
e09ea879
GGUF: uniform meta-time swap, no rename, bind kernels post-load
e0aae61b
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub