llama.cpp
server : add chat truncation to keep chat going
#19841
Open
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
60
Changes
View On
GitHub
server : add chat truncation to keep chat going
#19841
ltoniazzi
wants to merge 60 commits into
ggml-org:master
from
ltoniazzi:ltoniazzi/add-chat-truncation
github-actions
added
testing
github-actions
added
examples
github-actions
added
server
ggerganov
commented on 2026-02-24
ngxson
commented on 2026-02-24
ngxson
commented on 2026-02-24
ltoniazzi
force pushed
from
896c2c9e
to
d89d5b6f
6 days ago
github-actions
added
python
ltoniazzi
commented on 2026-02-27
ltoniazzi
changed the title
Add chat truncation [WIP]
server : add chat truncation to keep chat going
5 days ago
Basic truncation version for cli
cc329112
server version working
d7e9940a
Add TODOs about trigger threshold computation
e49e10ab
Add tests for
7a740d65
Add docstring
3ad265ed
Improve docstring
27f7c6fa
Rename var, move args (partial) and ignore cli
ee449e5c
Refactor params pt2
1dc9ba46
Cleanup cli
3db8dbaf
Cleanup cli
0173105d
Refactor params v3 and use actual ctx size
47ef2e37
Remove defensive null vocab guard
669686cd
Use getter for token choice
49207fed
Rename n_ctx_slot to n_ctx as not ambiguous where used
a82fe737
Clean cli
2f3aa726
Decouple logical parameter computations
1d2bb6bb
Refactor logic for triggering truncation
cef2aae9
Move utils to server
d40c004f
Update cpp tests (tmp)
bcde0f01
lint
e30d4145
Add python server tests
eda7c517
Remove cpp tests
c6551a19
Add docstrings
cecfa583
Update tests
8d269a1f
Rename n predict with server priority
460ffc63
Allow 1 user message to skip truncation and params bounds
b069858d
Update tests
e3d2fad9
Clean up docstrings
46414d33
Lint
0363f0a8
Lint
188b3ef8
Reduce loops on counting user messages
216842c4
Type warning fix
3aa65a00
Rename n_ctx_seq
5f1d5ac5
Lint
58e6de05
Lint
cea2bc6c
Lint
b0cbd5e7
Lint
8aa1f929
Remove one method
a64fd947
Lint
be5c2205
Use max_completion_tokens
4c1844b6
ltoniazzi
force pushed
from
57813718
to
4c1844b6
5 days ago
ltoniazzi
marked this pull request as ready for review
5 days ago
ltoniazzi
commented on 2026-02-27
ltoniazzi
commented on 2026-02-27
ltoniazzi
commented on 2026-02-27
Fix trailing white spaces
bd1a6caf
ngxson
commented on 2026-02-27
ngxson
commented on 2026-02-27
ngxson
commented on 2026-02-27
Use vocab from ctx server
5e3d9009
Remove vocab from chat params
e3bebc55
Add new tests draft for sleeping and multimodal input
c65d80de
Add todos to tests
d1eae21e
Note on n_ctx_seq in chat params
175342e4
Fix sleeping test assertion
c0296032
Split truncation params into 2
3fc73496
ltoniazzi
marked this pull request as draft
3 days ago
Control number of n_predict in test
8e120bf6
Skip truncation if a media is present
bbf47cdb
Truncate mtmd counting exact space pos/tokens V0: slow
79ea931e
Compute images n_pos only once -> calculate tot with text + placeholders
be657635
O(N) Working version with ugly code
937b6756
Group by turn and apply template
721c0ca3
Fix turn count
ae2b1d25
Use bifurcation
27c0aaab
Refactor binary search
f2e26200
Clean up
2080446a
Polished truncation logic
98f26abc
Added image tests
109daafa
Login to write a write a comment.
Login via GitHub
Reviewers
ngxson
CISC
ggerganov
Assignees
No one assigned
Labels
testing
examples
python
server
Milestone
No milestone
Login to write a write a comment.
Login via GitHub