llama.cpp
opencl: flash attention improvement
#25069
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
11
Changes
View On
GitHub
Commits
opencl: rework FA kernel for f16 and f32
lhez
committed
7 days ago
opencl: flash-attention prefill prepass kernels
lhez
committed
7 days ago
opencl: FA kernels for q4_0 and q8_0
lhez
committed
7 days ago
opencl: `set_rows` for f32 to q8_0/q4_0
lhez
committed
7 days ago
opencl: dequant kernels for q4_0 and q8_0
lhez
committed
7 days ago
opencl: add FA tile tuning table with override
lhez
committed
7 days ago
opencl: wire host side for FA
lhez
committed
7 days ago
opencl: q4_0 MoE tensors are also SOA'ed
lhez
committed
7 days ago
opencl: cosmetic fix
lhez
committed
7 days ago
opencl: refactor, also clarify some code paths in comments
lhez
committed
7 days ago
opencl: fix inifity for `-cl-finite-math-only`
lhez
committed
6 days ago
Loading