llama.cpp
server : support unified cache across slots
#16736
Merged

server : support unified cache across slots #16736

ggerganov merged 14 commits into master from gg/server-unified-slots
ggerganov
github-actions github-actions added examples
github-actions github-actions added server
slaren
slaren commented on 2025-10-23
github-actions github-actions added python
ggerganov ggerganov force pushed from 7a25d4b5 46 days ago
ggerganov ggerganov force pushed 46 days ago
ggerganov ggerganov force pushed 46 days ago
ggerganov ggerganov force pushed to 6369fe09 46 days ago
github-actions github-actions added testing
ggerganov ggerganov force pushed from 6369fe09 to ac261bea 45 days ago
ggerganov
ggerganov commented on 2025-10-29
ggerganov ggerganov force pushed from ac261bea 44 days ago
ggerganov ggerganov force pushed to 4e9e319b 44 days ago
ggerganov ggerganov marked this pull request as ready for review 44 days ago
ggerganov ggerganov requested a review from ngxson ngxson 44 days ago
ggerganov ggerganov requested a review from CISC CISC 44 days ago
ggerganov
ggerganov ggerganov requested a review from slaren slaren 44 days ago
slaren
slaren commented on 2025-11-01
ngxson
ngxson commented on 2025-11-01
ngxson
ggerganov
ggerganov server : support unified context across slots
57ece5ba
ggerganov cont : fix speculative decoding initialization
a42fb771
ggerganov context : fix n_ctx_per_seq computation
492f628c
ggerganov server : purge slots one by one
8222e9c2
ggerganov tests : add unified cache server tests
21791750
ggerganov llama : update per-seq context computation
f0f105ff
ggerganov test-thread-safety : handle tiny training context of the input model
e7b7cbfb
ggerganov server : fix server_tokens clear()
290f6a9f
ggerganov server : use 4 slots + unified KV by default
23323cd1
ggerganov llama : add note about context size queries
f2cca024
ggerganov cont : update todos [no ci]
ff684363
ggerganov context : do not cap the size of the context
c08d0d14
ggerganov ggerganov force pushed from 93373cc5 to c08d0d14 42 days ago
ggerganov tests : adjust parameters to be CI friendlier
356dc08b
slaren
slaren approved these changes on 2025-11-01
ggerganov context : add warning
56fceee2
ggerganov ggerganov merged cd5e3b57 into master 41 days ago
ggerganov ggerganov deleted the gg/server-unified-slots branch 41 days ago
EverchangerL
sxch775-work

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone