llama.cpp
hexagon: hmx flash attention
#22347
Merged

hexagon: hmx flash attention #22347

max-krasnyansky merged 41 commits into ggml-org:master from aizip:feat/hmx-fa
njsyw1997
github-actions github-actions added testing
github-actions github-actions added ggml
github-actions github-actions added Hexagon
njsyw1997
njsyw1997
chraac
chraac commented on 2026-04-30
max-krasnyansky
njsyw1997
max-krasnyansky
max-krasnyansky
njsyw1997
max-krasnyansky
njsyw1997
njsyw1997 hmx: extract shared interleave headers and unify matmul batched
80987e6d
njsyw1997 hmx: add HMX-accelerated flash attention for prefill
3d4f4951
njsyw1997 hmx: replace asm wrappers with Q6_ intrinsics in hmx-utils.h
3057aa44
njsyw1997 hmx: drop the duplicate interleave_fp16_weight_chunk_to_tiles
4b2e21a1
njsyw1997 hmx: apply upstream optimization to hmx-flash-attn-ops.c
05f3918f
njsyw1997 hmx: unify interleave helper
3766b5c7
njsyw1997 hmx: multi-thread Q load / O store and enable prefill FA dispatch
288a12c8
njsyw1997 hmx: relax matmul pipeline gate to cover k > n shapes (e.g. FFN_down)
fc60e7c0
njsyw1997 hmx: optimize FA softmax mask phase (no-ALiBi fast path + GQA dedup)
2715ece0
njsyw1997 hmx: Add an asm memory clobber at the phase boundary to prevent reord…
437b2a80
njsyw1997 [experimental]: fp16 softmax (EXP2_HF) to accelerate fa
1d681b02
njsyw1997 hmx flash-attn: refine cost model coefficients based on profiling data
d9b851e6
njsyw1997 hmx flash-attn: replace asm clobber with targeted volatile reads on v…
92f85724
njsyw1997 hmx flash-attn: fix prefill correctness (dst indexing, softmax reduce…
4df22427
njsyw1997 hmx flash-attn: fix p_tiles dual-tile OOB race; enable MT + pipeline
89d1ab8a
njsyw1997 hmx flash-attn: preserve additive mask bias in no-ALiBi fast path
32f2b601
njsyw1997 hmx: fix softcap+EXP2_HF interaction, tighten matmul pipeline gate, a…
b4afd05b
njsyw1997 [Help Wanted]: refactor D matrix computation into separate function f…
4b072adb
njsyw1997 format code
2540d8cc
max-krasnyansky hexagon: looks like -O3 is causing issues with the large code base, s…
f82d4732
max-krasnyansky hexagon: use hex_ prefix for swap_ptr
e3a36b3d
max-krasnyansky hexagon: move vtcm_seq_alloc into vtcm-utils.h
7ab07be9
max-krasnyansky hmx-utils: add hmx_prefix for layout converters
1289238b
max-krasnyansky hmx-mm: move main hmx_mm functions to the end, remove unused fwd decl…
2e480560
max-krasnyansky hmx-mm: remove unused qweight_fetch_task_state_t and minor alignment …
156e55c2
max-krasnyansky hmx-fa: minor alignment fixes
be1a24df
max-krasnyansky hmx-fa: move hmx_flash_atten into hmx-ops.h
5a10173a
max-krasnyansky hmx-fa: remove redundant workpool pointer in the hmx_fa_ctx, plus min…
9940257c
max-krasnyansky hmx-fa: minor alignment and simplifications
f0af71e1
max-krasnyansky hexagon: move FA_EXP_F16 option to hostside CMake file
a7f6c72f
max-krasnyansky hmx-fa: use hvx_vec_splat_f16 instead of fp16_to_bits
431a2859
max-krasnyansky hmx-fa: add hvx_splat_u16/u8 and use that in the fa instead custom hv…
0513d862
max-krasnyansky hmx-fa: some more alignment updates in the core fa function
ec1bd5c7
max-krasnyansky hmx-fa: keep slopes in vtcm in fp16
1e10d609
max-krasnyansky hexagon: consistent noinline usage (after static)
a1f48785
max-krasnyansky hex-hmx: consistent use FARF_HIGH to enable debug output
65ed5003
max-krasnyansky hmx-utils: no need for always_inline attr
024a4328
max-krasnyansky hex-hmx: consistent noinline usage (static noinline ...)
a9967848
max-krasnyansky hex-hmx: simplify init_col_scales
59dcf9e1
max-krasnyansky hexagon: fix editorconfig errors
93f9cb6d
max-krasnyansky hmx-mm: minor alignment fixes
7f16fece
max-krasnyansky
njsyw1997 njsyw1997 force pushed from 56ee02f6 to e0ee5570 18 days ago
njsyw1997 njsyw1997 force pushed from e0ee5570 to 7f16fece 18 days ago
njsyw1997 njsyw1997 marked this pull request as ready for review 18 days ago
njsyw1997 njsyw1997 requested a review 18 days ago
njsyw1997
max-krasnyansky
max-krasnyansky approved these changes on 2026-05-02
max-krasnyansky
lhez
lhez approved these changes on 2026-05-02
max-krasnyansky max-krasnyansky merged 1a03cf47 into master 18 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone