llama.cpp
7ecd780b - vulkan: Use fp16 for the flash attention P*V multiplication (#12783)

Commit
151 days ago
vulkan: Use fp16 for the flash attention P*V multiplication (#12783) This is consistent with the ggml-cuda behavior and the mul_mat fallback.
Author
Parents
Loading