PR #19841 server : add chat truncation to keep chat going

server : add chat truncation to keep chat going #19841

ltoniazzi wants to merge 60 commits into ggml-org:master from ltoniazzi:ltoniazzi/add-chat-truncation

github-actions added testing

github-actions added examples

github-actions added server

ggerganov commented on 2026-02-24

ngxson commented on 2026-02-24

ltoniazzi force pushed from 896c2c9e to d89d5b6f 6 days ago

github-actions added python

ltoniazzi commented on 2026-02-27

ltoniazzi changed the title ~~Add chat truncation [WIP]~~ server : add chat truncation to keep chat going 5 days ago

Basic truncation version for cli

cc329112

server version working

d7e9940a

Add TODOs about trigger threshold computation

e49e10ab

Add tests for

7a740d65

Add docstring

3ad265ed

Improve docstring

27f7c6fa

Rename var, move args (partial) and ignore cli

ee449e5c

Refactor params pt2

1dc9ba46

Cleanup cli

3db8dbaf

Cleanup cli

0173105d

Refactor params v3 and use actual ctx size

47ef2e37

Remove defensive null vocab guard

669686cd

Use getter for token choice

49207fed

Rename n_ctx_slot to n_ctx as not ambiguous where used

a82fe737

Clean cli

2f3aa726

Decouple logical parameter computations

1d2bb6bb

Refactor logic for triggering truncation

cef2aae9

Move utils to server

d40c004f

Update cpp tests (tmp)

bcde0f01

lint

e30d4145

Add python server tests

eda7c517

Remove cpp tests

c6551a19

Add docstrings

cecfa583

Update tests

8d269a1f

Rename n predict with server priority

460ffc63

Allow 1 user message to skip truncation and params bounds

b069858d

Update tests

e3d2fad9

Clean up docstrings

46414d33

Lint

0363f0a8

Lint

188b3ef8

Reduce loops on counting user messages

216842c4

Type warning fix

3aa65a00

Rename n_ctx_seq

5f1d5ac5

Lint

58e6de05

Lint

cea2bc6c

Lint

b0cbd5e7

Lint

8aa1f929

Remove one method

a64fd947

Lint

be5c2205

Use max_completion_tokens

4c1844b6

ltoniazzi force pushed from 57813718 to 4c1844b6 5 days ago

ltoniazzi marked this pull request as ready for review 5 days ago

ltoniazzi commented on 2026-02-27

Fix trailing white spaces

bd1a6caf

ngxson commented on 2026-02-27

Use vocab from ctx server

5e3d9009

Remove vocab from chat params

e3bebc55

Add new tests draft for sleeping and multimodal input

c65d80de

Add todos to tests

d1eae21e

Note on n_ctx_seq in chat params

175342e4

Fix sleeping test assertion

c0296032

Split truncation params into 2

3fc73496

ltoniazzi marked this pull request as draft 3 days ago

Control number of n_predict in test

8e120bf6

Skip truncation if a media is present

bbf47cdb

Truncate mtmd counting exact space pos/tokens V0: slow

79ea931e

Compute images n_pos only once -> calculate tot with text + placeholders

be657635

O(N) Working version with ugly code

937b6756

Group by turn and apply template

721c0ca3

Fix turn count

ae2b1d25

Use bifurcation

27c0aaab

Refactor binary search

f2e26200

Clean up

2080446a

Polished truncation logic

98f26abc

Added image tests

109daafa

Reviewers

ngxson

CISC

ggerganov

Assignees

No one assigned

Labels

testing examples python server

Milestone

No milestone

llama.cpp server : add chat truncation to keep chat going #19841 Open

server : add chat truncation to keep chat going #19841

llama.cpp
server : add chat truncation to keep chat going
#19841

Open