llama.cpp
Add Hexagon Matrix Extensions (HMX) for Hexagon NPU backend
#20693
Merged

Add Hexagon Matrix Extensions (HMX) for Hexagon NPU backend #20693

njsyw1997
njsyw1997 njsyw1997 requested a review 23 days ago
github-actions github-actions added ggml
github-actions github-actions added Hexagon
chraac
chraac commented on 2026-03-18
chraac
chraac commented on 2026-03-18
njsyw1997 migrate(vtcm): unify VTCM management for HMX merge
cca1cb87
njsyw1997 migrate(repack): replace x4x2 with HMX tile-permuted super-block format
7e641b5e
njsyw1997 migrate(dma): add dma_queue_push_1d() convenience wrapper for HMX ops
5863fcf5
njsyw1997 migrate(hmx): reorganize HMX files into htp/hmx/ and simplify HMX loc…
2f628d1c
njsyw1997 migrate(hmx-infra): consolidate HMX infrastructure into htp_context
5426a6e0
njsyw1997 migrate(flash-attn): remove HTP_EXP2_TABLE_COPIES, use single exp2 table
56bbeda2
njsyw1997 migrate(dsp-main): add HMX priority dispatch in packet_callback
1fa952b0
njsyw1997 migrate(cmake-dsp): add HMX source files and -mhmx for v73+ skels
6e53fb50
njsyw1997 migrate(hmx-ops): fix compile errors in HMX ops for ggml struct compa…
ae98cd24
njsyw1997 hmx: set Q/O element type to fp16 for flash attention
334bf1d8
njsyw1997 hexagon: unify HMX weight format to x4x2, add IQ4_NL and DSP-side fal…
e3a11942
njsyw1997 Enhance HMX debugging capabilities with new tile dumping functions
a476c368
njsyw1997 OK for small mat mul
f21a276e
njsyw1997 hexagon: fix UDMA roiwidth 16-bit overflow in HMX matmul DMA transfers
d1b5384a
njsyw1997 hexagon: remove HMX RMS norm fallback and re-enable matmul pipeline
f7b4c499
njsyw1997 hexagon: guard all HMX matmul DMA transfers against UDMA 16-bit field…
ee5d6c3c
njsyw1997 hexagon: multithread activation/output transfer and add HMX matmul fa…
4106932c
njsyw1997 [todo]: dynamic alloc vtcm, cause prefill regression.
28df959f
njsyw1997 hexagon: constrain HMX mxmem tile load region to avoid VTCM bank boun…
76c53abd
njsyw1997 hexagon: split unaligned-M HMX matmul into HMX+HVX phases
56c4d3e7
njsyw1997 hexagon: batch-4 Q4_0 dequantize fast path and remove debug traces
73efe1b5
njsyw1997 hexagon: abort on DSP error and fix HMX-to-HVX fallback quantize flag
e23b3edb
njsyw1997 hexagon: support batch matmul. This fix perplexity issue
c65f912d
njsyw1997 hexagon: reuse weights in fp16 batch matmul
5313184b
njsyw1997 hexagon: remove unused HMX flash attention operations and precomputat…
feb4a3fb
njsyw1997 hexagon: remove unused HVX math helpers, debug infrastructure, and st…
06ab76cb
njsyw1997 hexagon: fix HMX not enabled due to missing force_hvx parameter in IDL
689f9804
njsyw1997 hexagon: remove the unnecessary changes not related to HMX
77a166d2
njsyw1997 hexagon: bypass HMX by default
bb12f235
njsyw1997 hexagon: add upstream repo link to htp-ops-lib ported file headers
90e4573c
max-krasnyansky hexagon: restore host buffer support
9b633954
max-krasnyansky hexagon: add HMX=1 option for the adb scripts
485921f2
max-krasnyansky hex-hmx: improve DMA pipelining
8a6534b9
max-krasnyansky hex-hmx: further improvements to dma pipelining
b4a726cf
max-krasnyansky hex-hmx: minor cleanup
aae158c9
max-krasnyansky hex-hmx: move hmx lock out of inner loops/calls
1c961a7f
max-krasnyansky hex-hmx: remove unnecessary state and wrappers
7c3e785c
max-krasnyansky hex-hmx: remove hmx dir and unify f32 to f16 conversions
6ecb04c5
max-krasnyansky hex-hmx: further unify hvx conversions
3478b673
max-krasnyansky hex-hmx: revert f16 converter to the original for now
95ca33f4
max-krasnyansky hex-hmx: minor cleanup for f16 to f32 converter
8611ead7
max-krasnyansky hex-mm: replace incorrect fp16-to-fp32 hmx converter and reformated r…
f3a25148
max-krasnyansky hex-dma: move chanied dma push into hex-dma.h header and update hmx-mm
19b68ab5
max-krasnyansky hex-mm: use hex_is_aligned instead of a duplicated hmx_is_aligned
3645fa3f
max-krasnyansky hex-mm: use hvx_vec_splat_f16 in the hmx code
03dfccf7
max-krasnyansky hex-mm: use VLEN and HTP types in hmx-code
fb937da8
max-krasnyansky hex-mm: remove duplicate QK and defs
3a1dc30d
njsyw1997 hexagon: pre-shuffle quants before vlut16
f3741a3c
max-krasnyansky hexagon: enabel HMX by default
fbeefeab
max-krasnyansky hex-mm: code indent fixes for hmx-matmul
1f0c4e6f
max-krasnyansky hexagon: update hex-utils to include align/smin/etc helpers and use t…
07e428bb
max-krasnyansky hex-mm: more formatting fixes
769d3cb1
max-krasnyansky hex-mm: minor naming updates in hmx code
cd0bb004
max-krasnyansky hex-mm: remove leftover from rebase conflict
b120f771
max-krasnyansky
max-krasnyansky
njsyw1997 njsyw1997 force pushed from 8f7eb1c2 to b120f771 22 days ago
njsyw1997
github-actions github-actions added script
chraac
max-krasnyansky
max-krasnyansky
max-krasnyansky
max-krasnyansky commented on 2026-03-19
max-krasnyansky
njsyw1997 Fix the incorrect indents
2b0619f5
njsyw1997
max-krasnyansky
max-krasnyansky approved these changes on 2026-03-19
max-krasnyansky max-krasnyansky merged 74c42ee1 into master 21 days ago
coneco-cy
njsyw1997
coneco-cy

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone