Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
arch : add missing tensor name for Qwen 3.5
#21219 opened 2026-03-31 09:11 by
ownia
CANN: split RoPE cache init into host and device phases
#21218 opened 2026-03-31 08:43 by
noemotiovon
fix: correct misspellings in code comments
#21217 opened 2026-03-31 08:37 by
lainon1
common : simplify autoparser tagged parser rules
#21216 opened 2026-03-31 08:08 by
aldehir
common : cleanup logs and modernize the progress bar
#21215 opened 2026-03-31 08:05 by
angt
common : gpt-oss handle builtins and unsolicited tool calls
testing
#21213 opened 2026-03-31 07:05 by
aldehir
opencl: fix leak in Adreno q8_0 path
ggml
OpenCL
#21212 opened 2026-03-31 06:54 by
lhez
CI: Enable CPU and Vulkan ARM64 Release
devops
#21207 opened 2026-03-31 05:21 by
ehfd
webui: fix syntax highlighting lost after streaming for non-common languages
examples
server
#21206 opened 2026-03-31 05:21 by
hmblair
CANN: Add suport for Qwen35 ops
testing
ggml
Ascend NPU
#21204 opened 2026-03-31 03:40 by
hipudding
server: respect the ignore eos flag
examples
python
server
#21203 opened 2026-03-31 01:49 by
ykhrustalev
Fix undefined timing measurement errors in server context
examples
server
#21201 opened 2026-03-30 22:47 by
thedanhoffman
llama-server: translating structured generation request parameters from responses API format to completions API format
examples
python
server
#21187 opened 2026-03-30 17:02 by
earslap
[SYCL] Enhance flash-attention performance
ggml
SYCL
#21185 opened 2026-03-30 14:53 by
arthw
tests: allow exporting graph ops from HF file without downloading weights
testing
#21182 opened 2026-03-30 13:09 by
0cc4m
Add API key server support with optional arguments --api-key and --ju…
examples
python
#21180 opened 2026-03-30 12:40 by
gelim
common : init in params parser, add Windows UTF-8 support
testing
examples
server
#21176 opened 2026-03-30 08:40 by
angt
server: improve Responses API compliance and Codex CLI compatibility
examples
python
server
#21174 opened 2026-03-30 07:54 by
krystophny
ggml-cuda: fix ROCm multi-GPU illegal memory access in recurrent state restore
Nvidia GPU
ggml
#21170 opened 2026-03-30 03:43 by
uaruss
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels
Nvidia GPU
ggml
#21168 opened 2026-03-30 00:11 by
iacopPBK
contrib : clarify code origin guidelines
#21165 opened 2026-03-29 22:03 by
ddh0
cpp: Adding new arch RUGPT3XL
model
python
#21161 opened 2026-03-29 19:55 by
EvilFreelancer
Cross-backend profiler
documentation
Nvidia GPU
Vulkan
examples
python
ggml
SYCL
Apple Metal
Ascend NPU
OpenCL
IBM zDNN
Hexagon
OpenVINO
WebGPU
#21160 opened 2026-03-29 19:53 by
pwilkin
[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel
Nvidia GPU
ggml
#21159 opened 2026-03-29 19:16 by
gaugarg-nv
ggml-cpu: fix fallback for RVV kernels without zvfh
ggml
#21157 opened 2026-03-29 18:45 by
taimur-10x
examples : add llama-eval
examples
python
#21152 opened 2026-03-29 14:52 by
ggerganov
Support for DeepseekV32ForCausalLM with DeepSeek Sparse Attention (DSA)
model
testing
Nvidia GPU
python
ggml
#21149 opened 2026-03-29 12:56 by
fairydreaming
chore(docs): update list of UIs
#21148 opened 2026-03-29 11:23 by
sbhjt-gr
ggml-webgpu: Add the support of `MUL_MAT_ID`
documentation
ggml
WebGPU
#21147 opened 2026-03-29 10:53 by
yomaytk
common: add two-phase graceful reasoning budget termination ...
testing
#21141 opened 2026-03-29 03:03 by
zeel2104
Older