llama.cpp
a19b5cef - llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)

Commit
152 days ago
llama : fix FA when KV cache is not used (i.e. embeddings) (#12825) * ggml : FA supports F32 V * graph : cast KV to F16 when the KV cache is not used ggml-ci * server : add test that exercises embeddings with FA enabled ggml-ci
Author
Parents
Loading