diffusers
ca79f8cc - GGUF fix for unquantized types when using unquantize kernels (#12498)

Commit

93 days ago

GGUF fix for unquantized types when using unquantize kernels (#12498) Even if the `qweight_type` is one of the `UNQUANTIZED_TYPES`, qweight still has to be "dequantized" because it is stored as an 8-bit tensor. Without doing so, it is therefore a shape mismatch in the following matmul. Side notes: - why isn't DIFFUSERS_GGUF_CUDA_KERNELS on by default? It's significantly faster and only used when installed - https://huggingface.co/Isotr0py/ggml/tree/main/build has no build for torch 2.8 (or the upcoming 2.9). Who can we contact to make such a build? Co-authored-by: YiYi Xu <yixu310@gmail.com>

References

#12498 - GGUF fix for unquantized types when using unquantize kernels

Author

dxqb

Parents

99e2cfff

diffusers ca79f8cc - GGUF fix for unquantized types when using unquantize kernels (#12498)

diffusers
ca79f8cc - GGUF fix for unquantized types when using unquantize kernels (#12498)