llama.cpp
Support requantizing kvcache while model is loaded
#24367
Open

Support requantizing kvcache while model is loaded #24367

wadealexc
wadealexc feat(llama-server): when restoring from slot, automatically quantize …
4b8e60c0
wadealexc feat(llama-server): add POST /requantize_kvcache endpoint
21a0b4e7
wadealexc refactor: clean up implementation
4875dc76
wadealexc feat: add support for draft models
1b8cfd85
wadealexc wadealexc requested a review 12 days ago
wadealexc wadealexc requested a review from ggerganov ggerganov 12 days ago
wadealexc
github-actions github-actions added examples
github-actions github-actions added server
ngxson
ngxson requested changes on 2026-06-09
wadealexc
wadealexc
ngxson
wadealexc
wadealexc

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone