llama.cpp
hexagon: improved Op queuing, buffer and cache management
#21705
Merged

hexagon: improved Op queuing, buffer and cache management #21705

max-krasnyansky
max-krasnyansky hexagon: introduce op request batching and rewrite buffer managment
93b7adde
max-krasnyansky hex-dma: disable l2 bypass since to work around new issue due to no f…
bd14a8fc
max-krasnyansky hex-utils: add explicit l2flush and l2clear helpers
698e808a
max-krasnyansky hex-opreq: use fine-grain per tensor l2 management
99ad6017
max-krasnyansky hex-opreq: avoid redundant invalidates for tensors we already flushed
286a3454
max-krasnyansky hex-opreq: update debug messages
7e8a497c
max-krasnyansky htp-opreq: reuse ops_context
baf7d9e3
trivikram-reddy1 hex-opreq: do not flush or invalidate cache lines beyond buffer boundry
ae09b733
trivikram-reddy1 hex-opreq: fix errors in log message
0d3dcdf7
max-krasnyansky Revert "hex-opreq: do not flush or invalidate cache lines beyond buff…
18a5de03
max-krasnyansky hexagon: limit l2 flushes to 1MB which covers l2 cache
bd50ee7e
max-krasnyansky hex-opreq: limit cache flush to 4MB
9ff825d8
max-krasnyansky hexagon: drop cache flush size to 2MB
333310c7
max-krasnyansky hex-opreq: start reworking opreq packing
98e8aa6b
max-krasnyansky hex-opreq: introduce new way of packing opbatch where tensors are sto…
e896c2c1
max-krasnyansky hex-opreq: add a simple fastrpc call to force unmap all buffers
6df167c3
max-krasnyansky hex-l2flush: somehow 2MB does not seem robust, also cleanup step size…
cffcad06
max-krasnyansky hex-opreq: bump opreq batch size to 256
33e099f2
max-krasnyansky hex-mm: place src1 spad at the top of vtcm for easy reuse
7607147a
max-krasnyansky hex-ops: introduce internal types and disable src1 reuse for now
c0b94167
max-krasnyansky htp-opreq: use tensor pointers instead of copies
8384c3e1
max-krasnyansky hex-opreq: introduce more robust way for tracking vtcm/spad reuse
f820e387
max-krasnyansky hex-cumsum: fix error post opreq merge
7c0cbd06
max-krasnyansky hex-opreq: move request batch handling into the session
14d9737a
max-krasnyansky hex-mm: yet another fix for src1 reuse when we're mixing hmx/hvx
6ab7e2fc
max-krasnyansky hex-bufs: introduce pinned mmapings and use non-pinned ones for model…
a5beb908
max-krasnyansky hex-buf: add support for allocating shared/pinned buffer for opreqs
890acfa5
max-krasnyansky hex-opbatch: make opbatches configurable
11b03dea
max-krasnyansky hex-naming: better name for ggml_hexagon_shared_buffer
b1436450
max-krasnyansky hex-naming: add session->c_name() helper
e22aa802
max-krasnyansky hex-opbatch: start using shm but still copy for now
463be35f
max-krasnyansky hex-opbatch: use shared buffer for packing opbatch
9a2c9d1e
max-krasnyansky hex-opbatch: beter naming for opbatch related classes and code
2a7dc001
max-krasnyansky hex-opbatch: reuse batched tensors with same data/dims/strides
c79d8668
max-krasnyansky hex-opbatch: update logging
b76c9319
max-krasnyansky hex-opbatch: add support for vmem limit for op batching
47718cb5
max-krasnyansky hex-opbatch: update htp side to properly support dynamic mmap/unmap
508a6f03
max-krasnyansky hex-opbatch: add OB and OQ params for run-completion script and fix t…
3d72ca00
max-krasnyansky hex-opbatch: fixed src1 handling in act ops
23e25387
max-krasnyansky hex-act: fix empty src1 handling in swiglu and friends
3a06ef6b
max-krasnyansky hex-mm: minor fix vtcm and dma handling in matmul
3c04b2c8
max-krasnyansky hex-opbatch: allocate extra 1KB for dspqueue overhead
eb1b1066
max-krasnyansky hexagon: fix softmax for non-aligned tensors and cleanup vtcm alloc
fe9369b9
max-krasnyansky hex-mm: properly handle hmx_disabled flag
87b2f47d
max-krasnyansky hex-ops: update comments
23c86462
max-krasnyansky hex-ops: add debug output for get/set-rows
63246924
max-krasnyansky hex-mmap: optimize un/mapping of buffers
3a2f0c06
trivikram-reddy1 hex-opreq: global cache flush and invalidate beyond 128KB threshold
3980c32e
max-krasnyansky hex-ops: add super simple opfilter regex for debugging
e5b5d554
max-krasnyansky hex-opbatch: wireup newer ops missed in merge and update main switch …
954cf842
max-krasnyansky hexagon: improved vtcm acquision to remove inter-op overhead
835a3ab0
max-krasnyansky hex-mm: fixed hvx fallback path
38e3d03e
max-krasnyansky hex-mm: lower the vmem threshold a bit further to ~3GB
5cffc96f
max-krasnyansky hexagon: update debug & error logs
1524da86
max-krasnyansky hexagon: move ops context into main context
caa9ce28
max-krasnyansky hex-opbatch: cleanup naming and headers for opbatch and related descr…
aa0ef5b9
max-krasnyansky hex-fa: it's now better to enable FA during TG to reduce graph splits
14da2a1c
max-krasnyansky hexagon: remove GGML_HEXAGON_EXPERIMENTAL env var
3c666579
max-krasnyansky max-krasnyansky requested a review 72 days ago
max-krasnyansky max-krasnyansky changed the title Hexagon opbatch hexagon: improved Op queuing, buffer and cache management 72 days ago
max-krasnyansky hexagon: fixed editorconfig check
334caa10
github-actions github-actions added documentation
github-actions github-actions added script
github-actions github-actions added ggml
github-actions github-actions added Hexagon
max-krasnyansky
CISC
CISC approved these changes on 2026-04-10
max-krasnyansky Update ggml/src/ggml-hexagon/ggml-hexagon.cpp
6ff98500
lhez
lhez approved these changes on 2026-04-10
max-krasnyansky max-krasnyansky merged 9aa28077 into master 71 days ago
max-krasnyansky max-krasnyansky deleted the hexagon-opbatch branch 67 days ago
kanster
max-krasnyansky
micheal-gump
max-krasnyansky
micheal-gump
max-krasnyansky

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone