Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
log: add condition of debug log
#24054 opened 2026-06-03 02:10 by
ninedreams
cmake: skip cvector-generator and export-lora when CPU backend is disabled
model
script
testing
examples
python
server
ggml
OpenCL
Hexagon
#24053 opened 2026-06-02 21:46 by
arichiardi
common: centralise datetime formatting
examples
server
jinja parser
#24047 opened 2026-06-02 20:23 by
socram8888
[ggml-webgpu] Implement 2D workgroups for scale, binary, and unary ops
ggml
WebGPU
#24044 opened 2026-06-02 18:56 by
nikhilJain17
server: avoid unnecessary checkpoint invalidation for recurrent / hybrid models
examples
server
#24035 opened 2026-06-02 16:52 by
Regrad
jinja: implement map('filter')
testing
jinja parser
#24033 opened 2026-06-02 16:08 by
MuoDoo
ui: Mermaid Diagrams in chat + interactive preview
examples
server/ui
#24032 opened 2026-06-02 16:02 by
allozaur
tests : add support for qwen3 SSM archs
model
#24031 opened 2026-06-02 16:00 by
ggerganov
Avoid PDL race conditions by disabling __restrict__ when PDL is used
Nvidia GPU
ggml
#24030 opened 2026-06-02 15:42 by
aendk
server: persist slot checkpoints to .ckpt sidecar (LSCKPT2)
examples
server
#24028 opened 2026-06-02 15:08 by
LuminaNAO
qwen35: use post-norm hidden state for MTP
model
#24025 opened 2026-06-02 13:49 by
am17an
tools/ui: add OAuth support for MCP servers
examples
server/ui
#24023 opened 2026-06-02 13:06 by
LPFchan
metal : per-op source split + parallel compile
ggml
Apple Metal
#24021 opened 2026-06-02 12:38 by
forforever73
mtmd: correct gemma4 min/max tokens
examples
#24014 opened 2026-06-02 09:08 by
ngxson
ggml: support concat for scalar types at cuda backend
testing
Nvidia GPU
ggml
#24011 opened 2026-06-02 07:38 by
zihaomu
server: add KV cache metrics
examples
python
server
#24010 opened 2026-06-02 06:41 by
lvsijian8
[ggml-webgpu] Handle buffer overlap / buffer aliasing for concat operator
ggml
WebGPU
#24000 opened 2026-06-02 01:40 by
nikhilJain17
ggml: add f16 out_prod support for CPU and out_prod op for Vulkan
Vulkan
ggml
#23997 opened 2026-06-01 21:55 by
Lamothe
Make GGML_SYCL_F16=ON the default
documentation
examples
devops
ggml
SYCL
#23996 opened 2026-06-01 21:23 by
malsbat
server/common: Fix `response_format: json_schema` & prefill parsing bug
examples
server
#23993 opened 2026-06-01 20:04 by
roj234
metal: optimize pad_reflect_1d_f32 kernel
testing
ggml
Apple Metal
#23992 opened 2026-06-01 19:58 by
shrivasshankar
vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads
testing
Vulkan
ggml
#23991 opened 2026-06-01 19:57 by
jeffbolznv
HIP: fixing SSM_SCAN backend testcase error
Nvidia GPU
ggml
#23983 opened 2026-06-01 16:50 by
jiachengjason
server: (router) add model management API
examples
python
server
#23976 opened 2026-06-01 14:45 by
ngxson
build : use umbrella Headers directory for XCFramework module map
#23974 opened 2026-06-01 14:03 by
gmarzjr
vulkan: add fast path for contiguous buffer transfers
Vulkan
ggml
#23973 opened 2026-06-01 13:47 by
winstonma
vulkan: add fwht support for Intel with shmem reduction
Vulkan
ggml
#23964 opened 2026-06-01 10:33 by
0cc4m
ggml-cpu : add AVX2 and AVX optimization for nvfp4 dot product
ggml
#23961 opened 2026-06-01 08:24 by
ragz4125
common: vectorize common_embd_similarity_cos for powerpc.
#23960 opened 2026-06-01 07:40 by
shalinib-ibm
Simple implementation for limiting shell commands.
examples
server
#23956 opened 2026-06-01 05:00 by
Penguin-Guru
Older