llama.cpp
a19b5cef
- llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
152 days ago
llama : fix FA when KV cache is not used (i.e. embeddings) (#12825) * ggml : FA supports F32 V * graph : cast KV to F16 when the KV cache is not used ggml-ci * server : add test that exercises embeddings with FA enabled ggml-ci
References
#12825 - llama : fix FA when KV cache is not used (i.e. embeddings)
Author
ggerganov
Parents
78a1ba0a
Loading