PR #25069 opencl: flash attention improvement

opencl: rework FA kernel for f16 and f32

lhez committed 7 days ago

opencl: flash-attention prefill prepass kernels

lhez committed 7 days ago

opencl: FA kernels for q4_0 and q8_0

lhez committed 7 days ago

opencl: `set_rows` for f32 to q8_0/q4_0

lhez committed 7 days ago

opencl: dequant kernels for q4_0 and q8_0

lhez committed 7 days ago

opencl: add FA tile tuning table with override

lhez committed 7 days ago

opencl: wire host side for FA

lhez committed 7 days ago

opencl: q4_0 MoE tensors are also SOA'ed

lhez committed 7 days ago

opencl: cosmetic fix

lhez committed 7 days ago

opencl: refactor, also clarify some code paths in comments

lhez committed 7 days ago

opencl: fix inifity for `-cl-finite-math-only`

lhez committed 6 days ago

llama.cpp opencl: flash attention improvement #25069 Merged