server: avoid unnecessary checkpoint restore when new tokens are present (#24110)
* server: avoid unnecessary checkpoint restore when new tokens are present
The pos_min_thold calculation unconditionally subtracts 1 to ensure at
least one token is evaluated for logits when no new tokens exist.
However, when the request contains new tokens beyond the cached prefix,
this -1 is overly conservative and may trigger an unnecessary checkpoint
restore.
Conditionally apply the -1 only when n_past >= task.n_tokens() (no new
tokens), avoiding redundant KV state restoration when there is actual
work to do.
* cont : add ref
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>