hexagon: hmx flash attention #22347
chraac
commented
on 2026-04-30
hmx: extract shared interleave headers and unify matmul batched
80987e6d
hmx: add HMX-accelerated flash attention for prefill
3d4f4951
hmx: replace asm wrappers with Q6_ intrinsics in hmx-utils.h
3057aa44
hmx: drop the duplicate interleave_fp16_weight_chunk_to_tiles
4b2e21a1
hmx: apply upstream optimization to hmx-flash-attn-ops.c
05f3918f
hmx: unify interleave helper
3766b5c7
hmx: multi-thread Q load / O store and enable prefill FA dispatch
288a12c8
hmx: relax matmul pipeline gate to cover k > n shapes (e.g. FFN_down)
fc60e7c0
hmx: optimize FA softmax mask phase (no-ALiBi fast path + GQA dedup)
2715ece0
hmx: Add an asm memory clobber at the phase boundary to prevent reord…
437b2a80
[experimental]: fp16 softmax (EXP2_HF) to accelerate fa
1d681b02
hmx flash-attn: refine cost model coefficients based on profiling data
d9b851e6
hmx flash-attn: replace asm clobber with targeted volatile reads on v…
92f85724
hmx flash-attn: fix prefill correctness (dst indexing, softmax reduce…
4df22427
hmx flash-attn: fix p_tiles dual-tile OOB race; enable MT + pipeline
89d1ab8a
hmx flash-attn: preserve additive mask bias in no-ALiBi fast path
32f2b601
hmx: fix softcap+EXP2_HF interaction, tighten matmul pipeline gate, a…
b4afd05b
[Help Wanted]: refactor D matrix computation into separate function f…
4b072adb
format code
2540d8cc
hexagon: looks like -O3 is causing issues with the large code base, s…
f82d4732
hexagon: use hex_ prefix for swap_ptr
e3a36b3d
hexagon: move vtcm_seq_alloc into vtcm-utils.h
7ab07be9
hmx-utils: add hmx_prefix for layout converters
1289238b
hmx-mm: move main hmx_mm functions to the end, remove unused fwd decl…
2e480560
hmx-mm: remove unused qweight_fetch_task_state_t and minor alignment …
156e55c2
hmx-fa: minor alignment fixes
be1a24df
hmx-fa: move hmx_flash_atten into hmx-ops.h
5a10173a
hmx-fa: remove redundant workpool pointer in the hmx_fa_ctx, plus min…
9940257c
hmx-fa: minor alignment and simplifications
f0af71e1
hexagon: move FA_EXP_F16 option to hostside CMake file
a7f6c72f
hmx-fa: use hvx_vec_splat_f16 instead of fp16_to_bits
431a2859
hmx-fa: add hvx_splat_u16/u8 and use that in the fa instead custom hv…
0513d862
hmx-fa: some more alignment updates in the core fa function
ec1bd5c7
hmx-fa: keep slopes in vtcm in fp16
1e10d609
hexagon: consistent noinline usage (after static)
a1f48785
hex-hmx: consistent use FARF_HIGH to enable debug output
65ed5003
hmx-utils: no need for always_inline attr
024a4328
hex-hmx: consistent noinline usage (static noinline ...)
a9967848
hex-hmx: simplify init_col_scales
59dcf9e1
hexagon: fix editorconfig errors
93f9cb6d
hmx-mm: minor alignment fixes
7f16fece
njsyw1997
force pushed
from
56ee02f6
to
e0ee5570
18 days ago
njsyw1997
force pushed
from
e0ee5570
to
7f16fece
18 days ago
njsyw1997
marked this pull request as ready for review 18 days ago
lhez
approved these changes
on 2026-05-02
Assignees
No one assigned
Labels
testing
ggml
Hexagon
Login to write a write a comment.
Login via GitHub