llama.cpp
a19b5cef - llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)

Commit

152 days ago

llama : fix FA when KV cache is not used (i.e. embeddings) (#12825) * ggml : FA supports F32 V * graph : cast KV to F16 when the KV cache is not used ggml-ci * server : add test that exercises embeddings with FA enabled ggml-ci

References

#12825 - llama : fix FA when KV cache is not used (i.e. embeddings)

Author

ggerganov

Parents

78a1ba0a

llama.cpp a19b5cef - llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)

llama.cpp
a19b5cef - llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)