llama.cpp
llama : fix FA when KV cache is not used (i.e. embeddings)
#12825
Merged

llama : fix FA when KV cache is not used (i.e. embeddings) #12825

ggerganov merged 3 commits into master from gg/embd-fix-fa
ggerganov
ggerganov ggml : FA supports F32 V
3e6d1e4e
ggerganov graph : cast KV to F16 when the KV cache is not used
7cb9ae05
ggerganov server : add test that exercises embeddings with FA enabled
997b1b42
ggerganov ggerganov requested a review from ngxson ngxson 153 days ago
github-actions github-actions added examples
github-actions github-actions added python
github-actions github-actions added server
github-actions github-actions added ggml
github-actions github-actions added Apple Metal
ngxson
ngxson approved these changes on 2025-04-08
ggerganov ggerganov merged a19b5cef into master 153 days ago
ggerganov ggerganov deleted the gg/embd-fix-fa branch 153 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone