32 matmul
#18611

Merged

Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul #18611

max-krasnyansky merged 21 commits into ggml-org:master from qualcomm:hexaon-fa-and-matmul

max-krasnyansky requested a review from

lhez 32 days ago

github-actions added ggml

chraac commented on 2026-01-05

lhez approved these changes on 2026-01-05

chraac approved these changes on 2026-01-06

hexagon: improve fp16 matmul and add fp32/fp16 flash-attention

b89c0cd7

hexagon: add support for set-rows fp32 -> fp16 with i32/i64 row-idx

b46ce2b5

hexagon: add support for SCALE fp32

f13896a3

hexagon: replace scalar fp32 -> fp16 copy with HVX

182f9281

hexagon: optimize flash_atten_ext with aligned VTCM buffers and DMA

fed6cb9c

hexagon: use aligned mad_f16

bd1c1796

hexagon: flash_atten more aligned ops

7e448a57

hexagon: optimize scale_f32 hvx helpers

54846e0f

hexagon: unroll fa loops

74cce43d

hexagon: remove unused set-rows log

a2e5f675

hexagon: flash_attn_ext add support for DMAing Q

af036638

hexagon: fix handling of NANs hvx dotproducts

203c782f

hexagon: cleanup spad allocation in flash-atten

4d1c7fed

hexagon: improve fp16/fp32 matmul

180ce9d3

hexagon: fix HVX_ARCH check

4c184bd7

hexagon: matmul cleanup and fp16 fixes

94d676bb

hexagon: fix fp16 x fp16 matmuls and some minor refactoring

0015b6a1

hexagon: add support for GET_ROWS f32 -> f32

5385618f

hexagon: optimize set-rows threading

f854b2c0

hexagon: update adb/run-bench.sh to properly support experimental and…

e12c4753

max-krasnyansky force pushed from 8dc2948b to e12c4753 31 days ago

github-actions added script

hexagon: flash_atten use aligned vectors for dot products

8b210174

max-krasnyansky merged 95ea9e08 into master 31 days ago

Reviewers

lhez

chraac

Assignees

No one assigned

Labels

script ggml

Milestone

No milestone

llama.cpp Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul #18611 Merged

Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul #18611

llama.cpp
Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul
#18611

Merged