llama.cpp
opencl: flash attention improvement
#25069
Merged

opencl: flash attention improvement #25069

wanghqc
wanghqc opencl: rework FA kernel for f16 and f32
de973451
wanghqc opencl: flash-attention prefill prepass kernels
153c7faf
wanghqc opencl: FA kernels for q4_0 and q8_0
bd05512d
wanghqc opencl: `set_rows` for f32 to q8_0/q4_0
0fc396dd
wanghqc opencl: dequant kernels for q4_0 and q8_0
1ca6acff
wanghqc opencl: add FA tile tuning table with override
350d26d0
wanghqc opencl: wire host side for FA
6088e8b3
lhez opencl: q4_0 MoE tensors are also SOA'ed
00c1ffbc
lhez opencl: cosmetic fix
5d59efba
lhez opencl: refactor, also clarify some code paths in comments
7110431c
wanghqc wanghqc requested a review 6 days ago
github-actions github-actions added ggml
github-actions github-actions added OpenCL
ggml-gh-bot
wanghqc opencl: fix inifity for `-cl-finite-math-only`
faed07b9
lhez
lhez approved these changes on 2026-06-27
lhez lhez requested a review from max-krasnyansky max-krasnyansky 5 days ago
max-krasnyansky
max-krasnyansky approved these changes on 2026-06-27
max-krasnyansky max-krasnyansky merged ebd048fc into master 5 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone