llama.cpp
7ecd780b - vulkan: Use fp16 for the flash attention P*V multiplication (#12783)

Commit

151 days ago

vulkan: Use fp16 for the flash attention P*V multiplication (#12783) This is consistent with the ggml-cuda behavior and the mul_mat fallback.

References

Author

jeffbolznv

Parents