llama.cpp
Support requantizing kvcache while model is loaded
#24367

Open

Support requantizing kvcache while model is loaded #24367

wadealexc wants to merge 4 commits into ggml-org:master from wadealexc:support-requantize-memory

feat(llama-server): when restoring from slot, automatically quantize …

4b8e60c0

feat(llama-server): add POST /requantize_kvcache endpoint

21a0b4e7

refactor: clean up implementation

4875dc76

feat: add support for draft models

1b8cfd85

wadealexc requested a review 12 days ago

wadealexc requested a review from

ggerganov 12 days ago

github-actions added examples

github-actions added server

ngxson requested changes on 2026-06-09

Reviewers

ngxson

ggerganov

Assignees

No one assigned

Labels

examples server

Milestone

No milestone