llama.cpp
hexagon: improved Op queuing, buffer and cache management
#21705
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
60
Changes
View On
GitHub
hexagon: improved Op queuing, buffer and cache management
#21705
max-krasnyansky
merged 60 commits into
ggml-org:master
from
qualcomm:hexagon-opbatch
hexagon: introduce op request batching and rewrite buffer managment
93b7adde
hex-dma: disable l2 bypass since to work around new issue due to no f…
bd14a8fc
hex-utils: add explicit l2flush and l2clear helpers
698e808a
hex-opreq: use fine-grain per tensor l2 management
99ad6017
hex-opreq: avoid redundant invalidates for tensors we already flushed
286a3454
hex-opreq: update debug messages
7e8a497c
htp-opreq: reuse ops_context
baf7d9e3
hex-opreq: do not flush or invalidate cache lines beyond buffer boundry
ae09b733
hex-opreq: fix errors in log message
0d3dcdf7
Revert "hex-opreq: do not flush or invalidate cache lines beyond buff…
18a5de03
hexagon: limit l2 flushes to 1MB which covers l2 cache
bd50ee7e
hex-opreq: limit cache flush to 4MB
9ff825d8
hexagon: drop cache flush size to 2MB
333310c7
hex-opreq: start reworking opreq packing
98e8aa6b
hex-opreq: introduce new way of packing opbatch where tensors are sto…
e896c2c1
hex-opreq: add a simple fastrpc call to force unmap all buffers
6df167c3
hex-l2flush: somehow 2MB does not seem robust, also cleanup step size…
cffcad06
hex-opreq: bump opreq batch size to 256
33e099f2
hex-mm: place src1 spad at the top of vtcm for easy reuse
7607147a
hex-ops: introduce internal types and disable src1 reuse for now
c0b94167
htp-opreq: use tensor pointers instead of copies
8384c3e1
hex-opreq: introduce more robust way for tracking vtcm/spad reuse
f820e387
hex-cumsum: fix error post opreq merge
7c0cbd06
hex-opreq: move request batch handling into the session
14d9737a
hex-mm: yet another fix for src1 reuse when we're mixing hmx/hvx
6ab7e2fc
hex-bufs: introduce pinned mmapings and use non-pinned ones for model…
a5beb908
hex-buf: add support for allocating shared/pinned buffer for opreqs
890acfa5
hex-opbatch: make opbatches configurable
11b03dea
hex-naming: better name for ggml_hexagon_shared_buffer
b1436450
hex-naming: add session->c_name() helper
e22aa802
hex-opbatch: start using shm but still copy for now
463be35f
hex-opbatch: use shared buffer for packing opbatch
9a2c9d1e
hex-opbatch: beter naming for opbatch related classes and code
2a7dc001
hex-opbatch: reuse batched tensors with same data/dims/strides
c79d8668
hex-opbatch: update logging
b76c9319
hex-opbatch: add support for vmem limit for op batching
47718cb5
hex-opbatch: update htp side to properly support dynamic mmap/unmap
508a6f03
hex-opbatch: add OB and OQ params for run-completion script and fix t…
3d72ca00
hex-opbatch: fixed src1 handling in act ops
23e25387
hex-act: fix empty src1 handling in swiglu and friends
3a06ef6b
hex-mm: minor fix vtcm and dma handling in matmul
3c04b2c8
hex-opbatch: allocate extra 1KB for dspqueue overhead
eb1b1066
hexagon: fix softmax for non-aligned tensors and cleanup vtcm alloc
fe9369b9
hex-mm: properly handle hmx_disabled flag
87b2f47d
hex-ops: update comments
23c86462
hex-ops: add debug output for get/set-rows
63246924
hex-mmap: optimize un/mapping of buffers
3a2f0c06
hex-opreq: global cache flush and invalidate beyond 128KB threshold
3980c32e
hex-ops: add super simple opfilter regex for debugging
e5b5d554
hex-opbatch: wireup newer ops missed in merge and update main switch …
954cf842
hexagon: improved vtcm acquision to remove inter-op overhead
835a3ab0
hex-mm: fixed hvx fallback path
38e3d03e
hex-mm: lower the vmem threshold a bit further to ~3GB
5cffc96f
hexagon: update debug & error logs
1524da86
hexagon: move ops context into main context
caa9ce28
hex-opbatch: cleanup naming and headers for opbatch and related descr…
aa0ef5b9
hex-fa: it's now better to enable FA during TG to reduce graph splits
14da2a1c
hexagon: remove GGML_HEXAGON_EXPERIMENTAL env var
3c666579
max-krasnyansky
requested a review
72 days ago
max-krasnyansky
changed the title
Hexagon opbatch
hexagon: improved Op queuing, buffer and cache management
72 days ago
hexagon: fixed editorconfig check
334caa10
github-actions
added
documentation
github-actions
added
script
github-actions
added
ggml
github-actions
added
Hexagon
CISC
approved these changes on 2026-04-10
Update ggml/src/ggml-hexagon/ggml-hexagon.cpp
6ff98500
lhez
approved these changes on 2026-04-10
max-krasnyansky
merged
9aa28077
into master
71 days ago
max-krasnyansky
deleted the hexagon-opbatch branch
67 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
lhez
CISC
Assignees
No one assigned
Labels
documentation
script
ggml
Hexagon
Milestone
No milestone
Login to write a write a comment.
Login via GitHub