transformers
e09ea879 - GGUF: expose header metadata without materializing tensors

Commit

27 days ago

GGUF: expose header metadata without materializing tensors load_gguf_checkpoint now computes `tensor_quant_types` ({name: quant_type}) and `weight_mapping` unconditionally — they are read straight off the GGUF header (no tensor data), so a `return_tensors=False` call returns them cheaply. Only the eager `np.copy` of tensor bytes stays behind `return_tensors=True`. This lets the module-swap plan be built from metadata + renamings alone (pure name resolution, no tensor load / no conversion). Verified: return_tensors=False yields 291 quant types + 12 rules with no `tensors`; full load and AutoConfig via gguf unchanged; 63 fast tests pass.

References

#45977 - GgufLinear: inference-time GGUF matmul on Apple Silicon — llama.cpp parity

Author

ArthurZucker

Parents

b11c4d7a

transformers e09ea879 - GGUF: expose header metadata without materializing tensors

transformers
e09ea879 - GGUF: expose header metadata without materializing tensors