llama.cpp
opencl: flash attention improvement
#25069
Merged

Commits
  • opencl: rework FA kernel for f16 and f32
    lhez committed 7 days ago
  • opencl: flash-attention prefill prepass kernels
    lhez committed 7 days ago
  • opencl: FA kernels for q4_0 and q8_0
    lhez committed 7 days ago
  • opencl: `set_rows` for f32 to q8_0/q4_0
    lhez committed 7 days ago
  • opencl: dequant kernels for q4_0 and q8_0
    lhez committed 7 days ago
  • opencl: add FA tile tuning table with override
    lhez committed 7 days ago
  • opencl: wire host side for FA
    lhez committed 7 days ago
  • opencl: q4_0 MoE tensors are also SOA'ed
    lhez committed 7 days ago
  • opencl: cosmetic fix
    lhez committed 7 days ago
  • opencl: refactor, also clarify some code paths in comments
    lhez committed 7 days ago
  • opencl: fix inifity for `-cl-finite-math-only`
    lhez committed 6 days ago
Loading