Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
HIP: tune mmq/rocblas switching for RDNA4
#18816 opened 2026-01-13 16:19 by
jiachengjason
CUDA: fix allignment on register spill for FA
#18815 opened 2026-01-13 16:16 by
JohannesGaessler
vulkan: work around Intel ANV fp16 bug in MMQ
#18814 opened 2026-01-13 16:16 by
0cc4m
sampling : remove sampling branching in output_reserve
#18811 opened 2026-01-13 15:10 by
danbev
ggml-cuda : fix warp misaligned error in debug builds
Nvidia GPU
ggml
#18799 opened 2026-01-13 06:36 by
danbev
llama: fix integer type consistency in split helpers
#18798 opened 2026-01-13 04:15 by
MaheshJakkala
CANN: fix an issue where get_env was not fully renamed
devops
ggml
Ascend NPU
#18796 opened 2026-01-13 02:53 by
noemotiovon
Unified delta net handling for Qwen3Next and Kimi Linear models
model
#18792 opened 2026-01-12 19:06 by
pwilkin
ci, tests : use cmake to download models and remove libcurl dependency
build
testing
examples
devops
#18791 opened 2026-01-12 18:04 by
angt
server: improve slots scheduling for n_cmpl
examples
python
server
#18789 opened 2026-01-12 17:17 by
ngxson
server: fix memory reservations in populate_token_probs
examples
server
#18787 opened 2026-01-12 15:54 by
l-austenfeld
CUDA: Factor out and re-use `block_reduce` function
Nvidia GPU
ggml
#18785 opened 2026-01-12 15:09 by
ORippler
ggml-cpu: add RVV vec dot kernels for quantization types
ggml
#18784 opened 2026-01-12 14:54 by
taimur-10x
webui : send both backend_sampling == false/true
examples
server
#18781 opened 2026-01-12 13:22 by
ggerganov
vocab: add tokenizer support for jina-embeddings-v2-base-zh
python
#18756 opened 2026-01-11 11:58 by
o7si
Kimi-Linear support (backend agnostic + MLA KV cache)
model
python
ggml
#18755 opened 2026-01-11 11:55 by
ymcki
fix: OOB reads in UGM tokenizer (precompiled_charsmap handling)
#18750 opened 2026-01-11 08:57 by
hourhl
ggml, llama : add KV cache size limiting and block tracking infrastructure
model
testing
examples
ggml
#18747 opened 2026-01-11 00:46 by
pestopoppa
fix: use actual tensor embedding dimension instead of model parameter
#18745 opened 2026-01-10 22:15 by
chrismuzyn
server: add missing rerank and chat presets (#10932)
#18742 opened 2026-01-10 17:02 by
ingyukoh
POC: group gate_exps and up_exps + fix mxfp4 alignment for PP boost
model
python
#18740 opened 2026-01-10 15:17 by
am17an
llama: add canaries to Markdown files
#18735 opened 2026-01-10 11:03 by
JohannesGaessler
feat: add support for WeDLM architecture
python
#18731 opened 2026-01-10 02:07 by
feedseawave
lookup, lookahead: fix crash when n_ctx not specified
examples
#18729 opened 2026-01-10 00:09 by
pestopoppa
llama: fix pooled embedding readback sizing/stride and state I/O
#18723 opened 2026-01-09 18:43 by
retr0reg
model: Add VAETKI support
model
examples
python
#18719 opened 2026-01-09 14:42 by
dororodoroddo
ggml: new backend for Virglrenderer API Remoting acceleration (v2)
build
python
ggml
#18718 opened 2026-01-09 13:29 by
kpouget
Support parsing JSON into grammar for schemas with no type and no properties
#18711 opened 2026-01-09 07:37 by
markrietveld
vulkan: Check maxStorageBufferRange in supports_op
Vulkan
ggml
#18709 opened 2026-01-09 03:40 by
jeffbolznv
fix text spacing in print_info
#18708 opened 2026-01-09 02:29 by
ddh0
Older