Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate
examples
#20105 opened 2026-03-04 11:39 by
danbev
opencl: add `set`, i32 for `cpy`
ggml
OpenCL
#20101 opened 2026-03-04 07:39 by
lhez
[WebGPU] Fix wait logic for inflight jobs
devops
ggml
#20096 opened 2026-03-04 03:11 by
nikhilJain17
hexagon: add llama-completion windows runner script
script
#20095 opened 2026-03-04 01:15 by
tboinovski1
opencl: add q6_K gemm and gemv kernels for Adreno
ggml
OpenCL
#20089 opened 2026-03-03 19:30 by
lhez
server: Add OpenRouter-compatible reasoning API
examples
server
#20088 opened 2026-03-03 18:40 by
roj234
Hybrid model cache: add `--checkpoint-every-nb`
examples
server
#20087 opened 2026-03-03 18:27 by
pwilkin
llama : add attention weights extraction API [EXPERIMENTAL]
examples
python
#20086 opened 2026-03-03 17:12 by
QuentinFuxa
vulkan: Fix data races in coopmat1 mul_mat(_id)
Vulkan
ggml
#20084 opened 2026-03-03 16:50 by
jeffbolznv
CUDA: Add BF16 path to CUBLAS and increase precision of FP16 path
Nvidia GPU
ggml
#20078 opened 2026-03-03 16:02 by
ORippler
fix: correct EXAONE3 FFN_DOWN tensor mapping prefix
python
#20076 opened 2026-03-03 15:47 by
Bias92
fix: speculative decoding broken on hybrid SSM/MoE (Qwen3.5 MoE)
#20075 opened 2026-03-03 14:57 by
eauchs
vendor : update cpp-httplib to 0.36.0
script
python
#20073 opened 2026-03-03 14:08 by
cabelo
kleidiai : support for concurrent sme and neon kernel execution
documentation
ggml
#20070 opened 2026-03-03 12:34 by
chaxu01
cli: add /think command to toggle reasoning
examples
#20069 opened 2026-03-03 12:03 by
roj234
ggml-webgpu: Add the support of `GGML_OP_CONCAT`
documentation
ggml
#20068 opened 2026-03-03 11:54 by
yomaytk
cli: Don't clear system prompt when using '/clear'
examples
#20067 opened 2026-03-03 11:30 by
roj234
webui: Improvements for Models Selector UI
examples
server
#20066 opened 2026-03-03 11:09 by
allozaur
cmake: fix ARM feature detection hang on platforms without SVE/SME
ggml
#20064 opened 2026-03-03 10:36 by
mbucko
llama: parallel model loading across GPU contexts
#20062 opened 2026-03-03 09:47 by
mxxm-t
ggml : add NVFP4 quantization type support for metal
testing
python
ggml
Apple Metal
#20060 opened 2026-03-03 08:10 by
richarddd
vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap
Vulkan
ggml
#20059 opened 2026-03-03 07:56 by
rillomas
server: fix infinite retry loop when KV cache is full
examples
server
#20050 opened 2026-03-02 23:02 by
ssam18
fix(docs): correct typos found during code review
documentation
model
script
testing
Nvidia GPU
Vulkan
examples
python
server
ggml
SYCL
Apple Metal
Ascend NPU
OpenCL
jinja parser
#20041 opened 2026-03-02 14:46 by
marcelpetrick
contributing: limit open PRs for new contributors to 1
#20036 opened 2026-03-02 07:37 by
am17an
cann: support flash attention for head dim not multiple of 16
ggml
Ascend NPU
#20031 opened 2026-03-02 02:40 by
noemotiovon
vulkan: add UMA zero-copy async transfers and fix event_record deferred memcpy handling
testing
Vulkan
ggml
#20018 opened 2026-03-01 20:10 by
neilopet
vulkan: add sparse OOM fallback for large UMA allocations and chunked staging fallback
testing
Vulkan
ggml
#20017 opened 2026-03-01 20:02 by
neilopet
feat: add --cache-only flag to skip model re-download
#20010 opened 2026-03-01 14:43 by
lonnie08
server: add Qwen3-Reranker instruction support
examples
python
server
#20009 opened 2026-03-01 14:15 by
schwebke
Older