Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggml-org/llama.cpp
Pull Requests
Commits
Open
Closed
Refactor llama_model_quantize_params to expose a pure C interface
examples
#20346 opened 2026-03-10 14:11 by
EAddario
[ggml-virtgpu] Fix some build commands of `development.md`
documentation
#20341 opened 2026-03-10 09:37 by
yomaytk
llama : enable chunked fused GDN path
model
Nvidia GPU
ggml
#20340 opened 2026-03-10 08:52 by
ggerganov
Fix agentic mcp image single model
examples
server
#20339 opened 2026-03-10 08:49 by
ServeurpersoCom
vulkan: add GATED_DELTA_NET op support
testing
Vulkan
ggml
#20334 opened 2026-03-10 08:07 by
ProgenyAlpha
Model fix qwen3vl reranker support
python
#20332 opened 2026-03-10 06:51 by
ViniciosLugli
ggml : bump RPC version
ggml
#20330 opened 2026-03-10 06:12 by
ggerganov
cli : fix --reasoning-budget and --chat-template-kwargs being ignored
examples
#20329 opened 2026-03-10 05:43 by
TrevorS
opencl: use larger workgroup size for get_rows
ggml
OpenCL
#20316 opened 2026-03-09 21:03 by
lhez
vulkan: partial revert #20084
Vulkan
ggml
#20298 opened 2026-03-09 15:21 by
jeffbolznv
Handle reasoning budget
testing
examples
server
#20297 opened 2026-03-09 15:07 by
pwilkin
vulkan: fix OOB check in flash_attn_mask_opt
Vulkan
ggml
#20296 opened 2026-03-09 15:07 by
jeffbolznv
ci: disable coopmat on ubuntu-24-cmake-vulkan job
devops
#20294 opened 2026-03-09 14:34 by
0cc4m
[SYCL] fix op ROPE, add ROPE_BACK
documentation
ggml
SYCL
#20293 opened 2026-03-09 14:34 by
arthw
Add `--force-pure-content` to force a pure content parser.
examples
server
#20289 opened 2026-03-09 12:32 by
pwilkin
Gracefully handle undetected tool parser, print error message.
#20286 opened 2026-03-09 11:50 by
pwilkin
Support refusal content for Responses API
examples
server
#20285 opened 2026-03-09 11:42 by
pwilkin
[SYCL] fix for failed UT case: ACC, L2_NORM, UPSCALE, GEGLU
ggml
SYCL
#20283 opened 2026-03-09 10:21 by
arthw
ggml-cuda: gdn use shared mem for HIP
Nvidia GPU
ggml
#20282 opened 2026-03-09 09:53 by
am17an
model: add sarvam_moe architecture support
model
python
#20275 opened 2026-03-09 07:00 by
sumitchatterjee13
CANN: handle in-place ROPE on non-contiguous f32 tensors
ggml
Ascend NPU
#20274 opened 2026-03-09 06:56 by
noemotiovon
Read the persisted llama_kv_cell_ext for n_pos_per_embd > 1 on state_read for all sequence ids
#20273 opened 2026-03-09 06:51 by
sprayandwipe
server: support chunked transfer encoding
examples
server
#20269 opened 2026-03-09 05:36 by
crmky
WIP/POC: NVFP4 with CUDA SM120
documentation
testing
Nvidia GPU
examples
python
ggml
#20247 opened 2026-03-08 18:58 by
michaelw9999
metal : add Metal backend for GGML_OP_GATED_DELTA_NET
ggml
Apple Metal
#20244 opened 2026-03-08 18:04 by
arkavo-com
gguf-py: validate metadata values against declared types
python
#20242 opened 2026-03-08 16:30 by
eyupcanakman
webui : add option to copy assistant response without thinking content
examples
server
#20238 opened 2026-03-08 15:14 by
rankaiyx
Create build-apk.yml
android
examples
#20231 opened 2026-03-08 10:05 by
subhasishlak123
ggml-webgpu: Add supports for `GGML_OP_REPEAT`
documentation
ggml
#20230 opened 2026-03-08 09:51 by
yomaytk
Allow VisionEmbedding to recognize embedded images without loading mmproj
examples
server
#20228 opened 2026-03-08 08:07 by
ChenYFan
Older