Pull Requests abetlen/llama-cpp-python

docs: add Windows SYCL install example

#2316 opened 2026-06-19 18:24 by cheesss

docs(install): add consolidated GPU backends & build options guide (#2314)

#2315 opened 2026-06-19 01:07 by Anai-Guo

chore(deps): bump actions/checkout from 6 to 7 dependencies github_actions

#2313 opened 2026-06-18 20:03 by dependabot[bot]

server: add descriptions for rope/yarn settings

#2305 opened 2026-06-14 16:24 by rumitvn

chore(deps): bump pypa/cibuildwheel from 3.4.1 to 4.1.0 dependencies github_actions

#2299 opened 2026-06-12 20:03 by dependabot[bot]

feat: add v1 Model API

#2280 opened 2026-06-07 12:11 by abetlen

fix(chat_format): parse Gemma 4 native tool-call tokens into tool_calls (#2227)

#2232 opened 2026-05-28 01:40 by Anai-Guo

security: fix SSRF in multimodal image URL loading (_load_image)

#2220 opened 2026-05-16 21:17 by hoangperry

fix: improve error message when LlamaModel fails to load

#2187 opened 2026-04-21 00:00 by Anai-Guo

Add chat template for gemma models

#2183 opened 2026-04-13 14:33 by C00kieFact0ry

fix: prevent KV cache corruption on SWA/ISWA models + hot-path perf

#2180 opened 2026-04-12 15:45 by avion23

perf: vectorize KV cache prefix matching with numpy

#2179 opened 2026-04-11 22:55 by nausicaalii

build: disable soname to reduce binary size

#2177 opened 2026-04-09 16:33 by Bing-su

feat: add `reasoning_effort` to chat completions API

#2167 opened 2026-03-30 04:54 by abetlen

fix: auto-disable mmap when all layers offloaded to GPU (#1964)

#2147 opened 2026-03-22 15:42 by ljluestc

Clear kv cache and reset tokens after chat completion

#2141 opened 2026-03-14 03:59 by thisisayushg

This PR implements the previously stubbed state management methods in the _internals.py module and updates the corresponding API calls in llama.py to use the correct underlying C++ function names.

#2134 opened 2026-03-05 02:30 by bsides230

feat: Add DeepSeek R1 and distilled model support

#2131 opened 2026-03-01 20:50 by ljluestc

feat: add streaming tool use (rebased #1884 on latest main)

#2129 opened 2026-02-23 05:17 by XyLearningProgramming

chore: bump conda-incubator/setup-miniconda from v3.1.0 to v3.3.0

#2128 opened 2026-02-22 11:37 by Aiudadadadf

feat: support Granite-Docling model

#2109 opened 2026-01-04 05:35 by dhdaines

Fix issue #2096: Handle URLs with embedded HTTP credentials in _load_image

#2102 opened 2025-12-10 23:25 by nMaroulis

chore: update typing-extensions dependency and set github actions setup-python to v6

#2099 opened 2025-11-28 19:33 by AnvithaCodes

Fix: Install correct CUDA toolkit during build

#2088 opened 2025-11-12 03:20 by chamalgomes

Include x64 directory for CUDA DLLs on Windows

#2083 opened 2025-10-24 15:40 by ajparsons

Better Qwen2.5-VL chat template.

#2066 opened 2025-09-07 00:04 by alcoftTAO

Add timeout and error handling in FastAPI uvicorn server

#2044 opened 2025-07-22 10:46 by amandwivedi45

Actually create a random seed when using seed = -1 on load

#2042 opened 2025-07-16 09:00 by m-from-space

Improve error message when model file is missing

#2041 opened 2025-07-09 10:06 by NITHIN0710

ARM Runners support CUDA SBSA

#2039 opened 2025-07-07 10:45 by johnnynunez