llama.cpp
Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul
#18611
Merged

Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul #18611

max-krasnyansky
max-krasnyansky max-krasnyansky requested a review from lhez lhez 32 days ago
github-actions github-actions added ggml
chraac
chraac commented on 2026-01-05
lhez
lhez approved these changes on 2026-01-05
chraac
chraac approved these changes on 2026-01-06
max-krasnyansky hexagon: improve fp16 matmul and add fp32/fp16 flash-attention
b89c0cd7
max-krasnyansky hexagon: add support for set-rows fp32 -> fp16 with i32/i64 row-idx
b46ce2b5
max-krasnyansky hexagon: add support for SCALE fp32
f13896a3
max-krasnyansky hexagon: replace scalar fp32 -> fp16 copy with HVX
182f9281
max-krasnyansky hexagon: optimize flash_atten_ext with aligned VTCM buffers and DMA
fed6cb9c
max-krasnyansky hexagon: use aligned mad_f16
bd1c1796
max-krasnyansky hexagon: flash_atten more aligned ops
7e448a57
max-krasnyansky hexagon: optimize scale_f32 hvx helpers
54846e0f
max-krasnyansky hexagon: unroll fa loops
74cce43d
max-krasnyansky hexagon: remove unused set-rows log
a2e5f675
max-krasnyansky hexagon: flash_attn_ext add support for DMAing Q
af036638
max-krasnyansky hexagon: fix handling of NANs hvx dotproducts
203c782f
max-krasnyansky hexagon: cleanup spad allocation in flash-atten
4d1c7fed
max-krasnyansky hexagon: improve fp16/fp32 matmul
180ce9d3
max-krasnyansky hexagon: fix HVX_ARCH check
4c184bd7
max-krasnyansky hexagon: matmul cleanup and fp16 fixes
94d676bb
max-krasnyansky hexagon: fix fp16 x fp16 matmuls and some minor refactoring
0015b6a1
max-krasnyansky hexagon: add support for GET_ROWS f32 -> f32
5385618f
max-krasnyansky hexagon: optimize set-rows threading
f854b2c0
max-krasnyansky hexagon: update adb/run-bench.sh to properly support experimental and…
e12c4753
max-krasnyansky max-krasnyansky force pushed from 8dc2948b to e12c4753 31 days ago
max-krasnyansky
github-actions github-actions added script
max-krasnyansky hexagon: flash_atten use aligned vectors for dot products
8b210174
max-krasnyansky max-krasnyansky merged 95ea9e08 into master 31 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone