llama.cpp
75f3bc94 - vulkan: Flash Attention DP4A shader for quantized KV cache (#20797)

Commit
1 day ago
vulkan: Flash Attention DP4A shader for quantized KV cache (#20797) * use integer dot product for quantized KV flash attention * small improvements * fix SHMEM_STAGING indexing * add missing KV type quants * fixes * add supported quants to FA tests * readd fast paths for <8bit quants * fix mmq gate and shmem checks
Author
Parents
Loading