Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
UI: Add support for calling API endpoints on remote llama-server
examples
server/ui
#24383 opened 2026-06-09 23:03 by
niutech
server: avoid forwarding auth headers in CORS proxy
examples
python
server
server/ui
#24373 opened 2026-06-09 18:18 by
ItsMatti4
metal : wind down leftover residency sets at teardown instead of aborting
ggml
Apple Metal
#24368 opened 2026-06-09 15:54 by
AlexCherrypi
Support requantizing kvcache while model is loaded
examples
server
#24367 opened 2026-06-09 15:47 by
wadealexc
Force NVFP4 W4A8 path for NVFP4_W4A16 layers on Blackwell, where NVFP4 normally uses the native W4A4 path.
documentation
testing
Nvidia GPU
python
ggml
CUDA
#24364 opened 2026-06-09 14:32 by
ynankani
vulkan: disable FA mask_opt on GCN to improve performance
Vulkan
ggml
#24362 opened 2026-06-09 14:02 by
0cc4m
mtmd, llama: (demo) shared backend sched
examples
server
#24361 opened 2026-06-09 13:51 by
ngxson
webui: scope agentic stream writes to owning conversation
examples
server/ui
#24358 opened 2026-06-09 12:56 by
ssam18
ggml-zendnn : fix DL backend loading for Ollama
ggml
AMD ZenDNN
#24342 opened 2026-06-09 08:57 by
z-sachin
Support Step3.5/3.7 flash mtp3
model
#24340 opened 2026-06-09 08:50 by
forforever73
RPC: query remote backend op support instead of assuming all ops are supported
ggml
#24325 opened 2026-06-09 04:12 by
zihaomu
args: add --video-* CLI arguments
examples
server
#24318 opened 2026-06-08 20:12 by
ngxson
Docs: Adds AI Badgr as an optional hosted GPU launch path for running a `llama.cpp` server.
#24301 opened 2026-06-08 11:20 by
michaelmanly
server: refactor/generalize input file schema
examples
server
#24299 opened 2026-06-08 11:08 by
ngxson
llama: name fused GDN outputs before callbacks
model
#24298 opened 2026-06-08 10:51 by
bogdanr
rpc : fix UAF in graph_recompute leading to remote code execution
ggml
#24292 opened 2026-06-08 07:17 by
y198nt
Cast manually for _mm_prefetch() to avoid type mismatch error on Clang
ggml
#24255 opened 2026-06-07 06:07 by
enpinion
server : add token healing support
examples
server
#24247 opened 2026-06-06 20:19 by
lucky545545
vulkan: Read `nodes_per_submit` from `GGML_VK_NODES_PER_SUBMIT` env
Vulkan
ggml
#24240 opened 2026-06-06 18:52 by
konradmb
common : default GPU_MAX_HW_QUEUES=1 on HIP to fix idle GPU load
#24237 opened 2026-06-06 16:09 by
liminfei-amd
New GGML_OP_LIGHTNING_INDEXER that implements DeepSeek V3.2/V4 lightning indexer
ggml
#24231 opened 2026-06-06 12:20 by
fairydreaming
[RFC][PoC] Intra-Prompt Pipeline Scheduling for Multi-GPU Prefill
ggml
#24219 opened 2026-06-05 23:22 by
sergey-automation
feat(mtmd): Add WebP multimodal image support through single header library
examples
#24217 opened 2026-06-05 22:15 by
tomhollingworth
CUDA: remove -sm row, refactor cuBLAS
documentation
Nvidia GPU
ggml
#24216 opened 2026-06-05 22:01 by
JohannesGaessler
server: context shift
examples
server
#24210 opened 2026-06-05 20:44 by
C-Prime90
Add specialized tagged thinking tool parser
testing
#24202 opened 2026-06-05 18:01 by
bartdeboer
docs: link function calling guide from README
#24197 opened 2026-06-05 16:45 by
Wenjunyun123
Add ROCmFP4 CPU quantization support
examples
ggml
#24185 opened 2026-06-05 13:42 by
charlie12345
Initial ET backend
documentation
testing
examples
python
server
ggml
#24179 opened 2026-06-05 11:57 by
marty1885
server: improve user message detection and create checkpoints at every user message
testing
examples
server
#24176 opened 2026-06-05 11:30 by
aldehir
Newer
Older