llama.cpp
ggml, llama : add KV cache size limiting and block tracking infrastructure
#18747
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
17
Changes
View On
GitHub
ggml, llama : add KV cache size limiting and block tracking infrastructure
#18747
pestopoppa
wants to merge 17 commits into
ggml-org:master
from
pestopoppa:feature/paged-attention
feat: add --moe-n-expert flag for MoE expert count override (Hard Mask)
553b6dce
feat: add layer skip / early exit support for speculative decoding
b5e11afb
feat: add layer skip support for qwen3vl-moe and qwen3next
42e7d627
lookahead: fix n_seq_max and kv_unified configuration
7bf427dc
lookup, lookahead: fix crash when n_ctx not specified
2a16c438
ggml-cpu: parallelize tensor repacking with OpenMP
2ee7aa7e
docs: add branch management rules to prevent build issues
e3053631
kv-cache : optimize SWA slot reuse with forward-looking masking
394e0cb3
kv-cache: fix SWA cell reuse to ensure mathematical correctness
6b43356a
feat: implement CPU paged attention for flash attention
de4f93c9
feat: implement dynamic block allocation for paged attention
c0ca18b7
feat: add block pool statistics for debugging paged attention
eb40d730
feat: add KV cache memory reduction for paged attention
b14fe3bf
test: add unit tests for block pool and table
e14387ae
feat: add CLI flags for paged attention
9db451ee
refactor: trim verbose comments in llama-kv-block.h
0b633c35
pestopoppa
requested a review
from
ggerganov
2 days ago
pestopoppa
requested a review
from
CISC
2 days ago
pestopoppa
requested a review
from
JohannesGaessler
2 days ago
github-actions
added
model
github-actions
added
testing
github-actions
added
examples
github-actions
added
ggml
pestopoppa
changed the title
ggml, llama : add CPU paged attention for memory-efficient KV cache
ggml, llama : add KV cache size limiting and block tracking infrastructure
1 day ago
refactor: remove unrelated changes from KV cache PR
d98013d1
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
CISC
JohannesGaessler
Assignees
No one assigned
Labels
model
testing
examples
ggml
Milestone
No milestone
Login to write a write a comment.
Login via GitHub