llama.cpp
llama : fix FA when KV cache is not used (i.e. embeddings)
#12825

Merged

llama : fix FA when KV cache is not used (i.e. embeddings) #12825

ggerganov merged 3 commits into master from gg/embd-fix-fa

ggml : FA supports F32 V

3e6d1e4e

graph : cast KV to F16 when the KV cache is not used

7cb9ae05

server : add test that exercises embeddings with FA enabled

997b1b42

ggerganov requested a review from

ngxson 153 days ago

github-actions added examples

github-actions added python

github-actions added server

github-actions added ggml

github-actions added Apple Metal

ngxson approved these changes on 2025-04-08

ggerganov merged a19b5cef into master 153 days ago

ggerganov deleted the gg/embd-fix-fa branch 153 days ago

Reviewers

ngxson

Assignees

No one assigned

Labels

examples python server ggml Apple Metal

Milestone

No milestone