opencl: flash attention improvement #25069
opencl: rework FA kernel for f16 and f32
de973451
opencl: flash-attention prefill prepass kernels
153c7faf
opencl: FA kernels for q4_0 and q8_0
bd05512d
opencl: `set_rows` for f32 to q8_0/q4_0
0fc396dd
opencl: dequant kernels for q4_0 and q8_0
1ca6acff
opencl: add FA tile tuning table with override
350d26d0
opencl: wire host side for FA
6088e8b3
opencl: q4_0 MoE tensors are also SOA'ed
00c1ffbc
opencl: cosmetic fix
5d59efba
opencl: refactor, also clarify some code paths in comments
7110431c
wanghqc
requested a review
6 days ago
opencl: fix inifity for `-cl-finite-math-only`
faed07b9
lhez
approved these changes
on 2026-06-27
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub