llama.cpp
7ecd780b
- vulkan: Use fp16 for the flash attention P*V multiplication (#12783)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
151 days ago
vulkan: Use fp16 for the flash attention P*V multiplication (#12783) This is consistent with the ggml-cuda behavior and the mul_mat fallback.
References
#12783 - vulkan: Use fp16 for the flash attention P*V multiplication
Author
jeffbolznv
Parents
7538246e
Loading