llama.cpp
edd4d9bc - vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (#21029)

Commit

33 days ago

vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (#21029) Add dequantize4() implementations for Q4_1, Q5_0, Q5_1, and IQ4_NL in the flash attention base shader. Register them in the shader generator, pipeline creation, and enable in the scalar/coopmat1 FA support check.

References

#21029 - vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl

Author

mkoker

Parents

482192f1

llama.cpp edd4d9bc - vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (#21029)

llama.cpp
edd4d9bc - vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (#21029)