Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul #18611
chraac
commented
on 2026-01-05
lhez
approved these changes
on 2026-01-05
chraac
approved these changes
on 2026-01-06
hexagon: improve fp16 matmul and add fp32/fp16 flash-attention
b89c0cd7
hexagon: add support for set-rows fp32 -> fp16 with i32/i64 row-idx
b46ce2b5
hexagon: add support for SCALE fp32
f13896a3
hexagon: replace scalar fp32 -> fp16 copy with HVX
182f9281
hexagon: optimize flash_atten_ext with aligned VTCM buffers and DMA
fed6cb9c
hexagon: use aligned mad_f16
bd1c1796
hexagon: flash_atten more aligned ops
7e448a57
hexagon: optimize scale_f32 hvx helpers
54846e0f
hexagon: unroll fa loops
74cce43d
hexagon: remove unused set-rows log
a2e5f675
hexagon: flash_attn_ext add support for DMAing Q
af036638
hexagon: fix handling of NANs hvx dotproducts
203c782f
hexagon: cleanup spad allocation in flash-atten
4d1c7fed
hexagon: improve fp16/fp32 matmul
180ce9d3
hexagon: fix HVX_ARCH check
4c184bd7
hexagon: matmul cleanup and fp16 fixes
94d676bb
hexagon: fix fp16 x fp16 matmuls and some minor refactoring
0015b6a1
hexagon: add support for GET_ROWS f32 -> f32
5385618f
hexagon: optimize set-rows threading
f854b2c0
hexagon: update adb/run-bench.sh to properly support experimental and…
e12c4753
hexagon: flash_atten use aligned vectors for dot products
8b210174
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub