Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
abetlen/llama-cpp-python
Pull Requests
Commits
Open
Closed
docs: add Windows SYCL install example
#2316 opened 2026-06-19 18:24 by
cheesss
docs(install): add consolidated GPU backends & build options guide (#2314)
#2315 opened 2026-06-19 01:07 by
Anai-Guo
chore(deps): bump actions/checkout from 6 to 7
dependencies
github_actions
#2313 opened 2026-06-18 20:03 by
dependabot[bot]
server: add descriptions for rope/yarn settings
#2305 opened 2026-06-14 16:24 by
rumitvn
chore(deps): bump pypa/cibuildwheel from 3.4.1 to 4.1.0
dependencies
github_actions
#2299 opened 2026-06-12 20:03 by
dependabot[bot]
feat: add v1 Model API
#2280 opened 2026-06-07 12:11 by
abetlen
fix(chat_format): parse Gemma 4 native tool-call tokens into tool_calls (#2227)
#2232 opened 2026-05-28 01:40 by
Anai-Guo
security: fix SSRF in multimodal image URL loading (_load_image)
#2220 opened 2026-05-16 21:17 by
hoangperry
fix: improve error message when LlamaModel fails to load
#2187 opened 2026-04-21 00:00 by
Anai-Guo
Add chat template for gemma models
#2183 opened 2026-04-13 14:33 by
C00kieFact0ry
fix: prevent KV cache corruption on SWA/ISWA models + hot-path perf
#2180 opened 2026-04-12 15:45 by
avion23
perf: vectorize KV cache prefix matching with numpy
#2179 opened 2026-04-11 22:55 by
nausicaalii
build: disable soname to reduce binary size
#2177 opened 2026-04-09 16:33 by
Bing-su
feat: add `reasoning_effort` to chat completions API
#2167 opened 2026-03-30 04:54 by
abetlen
fix: auto-disable mmap when all layers offloaded to GPU (#1964)
#2147 opened 2026-03-22 15:42 by
ljluestc
Clear kv cache and reset tokens after chat completion
#2141 opened 2026-03-14 03:59 by
thisisayushg
This PR implements the previously stubbed state management methods in the _internals.py module and updates the corresponding API calls in llama.py to use the correct underlying C++ function names.
#2134 opened 2026-03-05 02:30 by
bsides230
feat: Add DeepSeek R1 and distilled model support
#2131 opened 2026-03-01 20:50 by
ljluestc
feat: add streaming tool use (rebased #1884 on latest main)
#2129 opened 2026-02-23 05:17 by
XyLearningProgramming
chore: bump conda-incubator/setup-miniconda from v3.1.0 to v3.3.0
#2128 opened 2026-02-22 11:37 by
Aiudadadadf
feat: support Granite-Docling model
#2109 opened 2026-01-04 05:35 by
dhdaines
Fix issue #2096: Handle URLs with embedded HTTP credentials in _load_image
#2102 opened 2025-12-10 23:25 by
nMaroulis
chore: update typing-extensions dependency and set github actions setup-python to v6
#2099 opened 2025-11-28 19:33 by
AnvithaCodes
Fix: Install correct CUDA toolkit during build
#2088 opened 2025-11-12 03:20 by
chamalgomes
Include x64 directory for CUDA DLLs on Windows
#2083 opened 2025-10-24 15:40 by
ajparsons
Better Qwen2.5-VL chat template.
#2066 opened 2025-09-07 00:04 by
alcoftTAO
Add timeout and error handling in FastAPI uvicorn server
#2044 opened 2025-07-22 10:46 by
amandwivedi45
Actually create a random seed when using seed = -1 on load
#2042 opened 2025-07-16 09:00 by
m-from-space
Improve error message when model file is missing
#2041 opened 2025-07-09 10:06 by
NITHIN0710
ARM Runners support CUDA SBSA
#2039 opened 2025-07-07 10:45 by
johnnynunez
Older