This PR implements the previously stubbed state management methods in the _internals.py module and updates the corresponding API calls in llama.py to use the correct underlying C++ function names. #2134
feat: update llama.cpp submodule and bindings for Qwen 3.5 support
d21ef679
fix: set BUILD_NUMBER and LLAMA_INSTALL_VERSION for mtmd build
eacc2584
fix: return bool from kv_cache_seq_rm for partial removal detection
01248477
fix: handle GDN hybrid models that reject partial memory removal
47aedc22
Update llama.cpp submodule to latest ggml-org
2ee9d3d3
This PR implements the previously stubbed state management methods in…
1a4f7589
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub