Support requantizing kvcache while model is loaded #24367
feat(llama-server): when restoring from slot, automatically quantize …
4b8e60c0
feat(llama-server): add POST /requantize_kvcache endpoint
21a0b4e7
refactor: clean up implementation
4875dc76
feat: add support for draft models
1b8cfd85
ngxson
requested changes
on 2026-06-09
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub