llama.cpp
server : add chat truncation to keep chat going
#19841
Open

server : add chat truncation to keep chat going #19841

ltoniazzi
ltoniazzi
github-actions github-actions added testing
github-actions github-actions added examples
github-actions github-actions added server
pwilkin
Midaychi
ggerganov
ggerganov commented on 2026-02-24
ngxson
ngxson commented on 2026-02-24
ngxson
ngxson commented on 2026-02-24
ltoniazzi ltoniazzi force pushed from 896c2c9e to d89d5b6f 6 days ago
github-actions github-actions added python
ltoniazzi
ltoniazzi commented on 2026-02-27
ltoniazzi ltoniazzi changed the title Add chat truncation [WIP] server : add chat truncation to keep chat going 5 days ago
ltoniazzi Basic truncation version for cli
cc329112
ltoniazzi server version working
d7e9940a
ltoniazzi Add TODOs about trigger threshold computation
e49e10ab
ltoniazzi Add tests for
7a740d65
ltoniazzi Add docstring
3ad265ed
ltoniazzi Improve docstring
27f7c6fa
ltoniazzi Rename var, move args (partial) and ignore cli
ee449e5c
ltoniazzi Refactor params pt2
1dc9ba46
ltoniazzi Cleanup cli
3db8dbaf
ltoniazzi Cleanup cli
0173105d
ltoniazzi Refactor params v3 and use actual ctx size
47ef2e37
ltoniazzi Remove defensive null vocab guard
669686cd
ltoniazzi Use getter for token choice
49207fed
ltoniazzi Rename n_ctx_slot to n_ctx as not ambiguous where used
a82fe737
ltoniazzi Clean cli
2f3aa726
ltoniazzi Decouple logical parameter computations
1d2bb6bb
ltoniazzi Refactor logic for triggering truncation
cef2aae9
ltoniazzi Move utils to server
d40c004f
ltoniazzi Update cpp tests (tmp)
bcde0f01
ltoniazzi lint
e30d4145
ltoniazzi Add python server tests
eda7c517
ltoniazzi Remove cpp tests
c6551a19
ltoniazzi Add docstrings
cecfa583
ltoniazzi Update tests
8d269a1f
ltoniazzi Rename n predict with server priority
460ffc63
ltoniazzi Allow 1 user message to skip truncation and params bounds
b069858d
ltoniazzi Update tests
e3d2fad9
ltoniazzi Clean up docstrings
46414d33
ltoniazzi Lint
0363f0a8
ltoniazzi Lint
188b3ef8
ltoniazzi Reduce loops on counting user messages
216842c4
ltoniazzi Type warning fix
3aa65a00
ltoniazzi Rename n_ctx_seq
5f1d5ac5
ltoniazzi Lint
58e6de05
ltoniazzi Lint
cea2bc6c
ltoniazzi Lint
b0cbd5e7
ltoniazzi Lint
8aa1f929
ltoniazzi Remove one method
a64fd947
ltoniazzi Lint
be5c2205
ltoniazzi Use max_completion_tokens
4c1844b6
ltoniazzi ltoniazzi force pushed from 57813718 to 4c1844b6 5 days ago
ltoniazzi ltoniazzi marked this pull request as ready for review 5 days ago
ltoniazzi
ltoniazzi commented on 2026-02-27
ltoniazzi
ltoniazzi commented on 2026-02-27
ltoniazzi
ltoniazzi commented on 2026-02-27
ltoniazzi Fix trailing white spaces
bd1a6caf
ngxson
ngxson commented on 2026-02-27
ngxson
ngxson commented on 2026-02-27
ngxson
ngxson commented on 2026-02-27
ltoniazzi Use vocab from ctx server
5e3d9009
aviallon
ltoniazzi Remove vocab from chat params
e3bebc55
ltoniazzi Add new tests draft for sleeping and multimodal input
c65d80de
ltoniazzi Add todos to tests
d1eae21e
ltoniazzi Note on n_ctx_seq in chat params
175342e4
ltoniazzi Fix sleeping test assertion
c0296032
ltoniazzi Split truncation params into 2
3fc73496
ltoniazzi ltoniazzi marked this pull request as draft 3 days ago
ltoniazzi Control number of n_predict in test
8e120bf6
ltoniazzi Skip truncation if a media is present
bbf47cdb
ltoniazzi Truncate mtmd counting exact space pos/tokens V0: slow
79ea931e
ltoniazzi Compute images n_pos only once -> calculate tot with text + placeholders
be657635
ltoniazzi
ngxson
ngxson
ltoniazzi
ngxson
ltoniazzi O(N) Working version with ugly code
937b6756
ltoniazzi
ltoniazzi Group by turn and apply template
721c0ca3
ltoniazzi Fix turn count
ae2b1d25
ngxson
ltoniazzi
ltoniazzi
ngxson
ngxson
ltoniazzi
ltoniazzi Use bifurcation
27c0aaab
ltoniazzi Refactor binary search
f2e26200
ltoniazzi Clean up
2080446a
ltoniazzi Polished truncation logic
98f26abc
ltoniazzi Added image tests
109daafa
ltoniazzi

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone