PR #25069 opencl: flash attention improvement

opencl: flash attention improvement #25069

max-krasnyansky merged 11 commits into ggml-org:master from qualcomm:hq/fa-rework

opencl: rework FA kernel for f16 and f32

de973451

opencl: flash-attention prefill prepass kernels

153c7faf

opencl: FA kernels for q4_0 and q8_0

bd05512d

opencl: `set_rows` for f32 to q8_0/q4_0

0fc396dd

opencl: dequant kernels for q4_0 and q8_0

1ca6acff

opencl: add FA tile tuning table with override

350d26d0

opencl: wire host side for FA

6088e8b3

opencl: q4_0 MoE tensors are also SOA'ed

00c1ffbc

opencl: cosmetic fix

5d59efba

opencl: refactor, also clarify some code paths in comments

7110431c

wanghqc requested a review 6 days ago

github-actions added ggml

github-actions added OpenCL

opencl: fix inifity for `-cl-finite-math-only`

faed07b9

lhez approved these changes on 2026-06-27

lhez requested a review from

max-krasnyansky 5 days ago

max-krasnyansky approved these changes on 2026-06-27

max-krasnyansky merged ebd048fc into master 5 days ago

Reviewers

max-krasnyansky

lhez

Assignees

No one assigned

Labels

ggml OpenCL

Milestone

No milestone