Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
doc: ban AI-generated PR descriptions [no ci]
#18765 opened 2026-01-11 21:39 by
JohannesGaessler
vulkan: Disable large coopmat matmul configuration on proprietary AMD driver
Vulkan
ggml
#18763 opened 2026-01-11 20:36 by
0cc4m
vocab: add tokenizer support for jina-embeddings-v2-base-zh
python
#18756 opened 2026-01-11 11:58 by
o7si
Kimi-Linear support (backend agnostic + MLA KV cache)
model
python
ggml
#18755 opened 2026-01-11 11:55 by
ymcki
fix: OOB reads in UGM tokenizer (precompiled_charsmap handling)
#18750 opened 2026-01-11 08:57 by
hourhl
ggml, llama : add CPU paged attention for memory-efficient KV cache
model
testing
examples
ggml
#18747 opened 2026-01-11 00:46 by
pestopoppa
fix: use actual tensor embedding dimension instead of model parameter
#18745 opened 2026-01-10 22:15 by
chrismuzyn
server: add missing rerank and chat presets (#10932)
#18742 opened 2026-01-10 17:02 by
ingyukoh
POC: group gate_exps and up_exps + fix mxfp4 alignment for PP boost
model
python
#18740 opened 2026-01-10 15:17 by
am17an
llama: add canaries to Markdown files
#18735 opened 2026-01-10 11:03 by
JohannesGaessler
feat: add support for WeDLM architecture
python
#18731 opened 2026-01-10 02:07 by
feedseawave
lookup, lookahead: fix crash when n_ctx not specified
examples
#18729 opened 2026-01-10 00:09 by
pestopoppa
llama: fix pooled embedding readback sizing/stride and state I/O
#18723 opened 2026-01-09 18:43 by
retr0reg
model: Add VAETKI support
model
examples
python
#18719 opened 2026-01-09 14:42 by
dororodoroddo
ggml: new backend for Virglrenderer API Remoting acceleration (v2)
build
python
ggml
#18718 opened 2026-01-09 13:29 by
kpouget
Support parsing JSON into grammar for schemas with no type and no properties
#18711 opened 2026-01-09 07:37 by
markrietveld
vulkan: Check maxStorageBufferRange in supports_op
Vulkan
ggml
#18709 opened 2026-01-09 03:40 by
jeffbolznv
fix text spacing in print_info
#18708 opened 2026-01-09 02:29 by
ddh0
ggml-metal: Clean up files used for embedded build
ggml
Apple Metal
#18705 opened 2026-01-09 00:36 by
DaAwesomeP
[WIP] ggml-opencl: op args init refactoring
ggml
OpenCL
#18701 opened 2026-01-08 16:49 by
chraac
Improving inference speed for the repack buffer type on NUMA architectures
ggml
#18698 opened 2026-01-08 15:01 by
zzjianhui
ggml-cuda: extend concat support for more types
Nvidia GPU
ggml
#18690 opened 2026-01-08 07:36 by
Lourdle
vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id)
testing
Vulkan
ggml
#18678 opened 2026-01-07 20:56 by
jeffbolznv
Autoparser - complete refactoring of parser architecture
documentation
model
script
testing
examples
python
server
#18675 opened 2026-01-07 18:45 by
pwilkin
server: support image+text input for embeddings (Qwen3-VL-Embedding)
examples
server
#18665 opened 2026-01-07 12:56 by
ngxson
MCP MVP
enhancement
server/webui
examples
server
#18655 opened 2026-01-07 08:32 by
allozaur
docs: update ops.md for CANN backend
documentation
#18654 opened 2026-01-07 08:22 by
hipudding
CANN: support gated linear attn
ggml
Ascend NPU
#18653 opened 2026-01-07 02:55 by
hipudding
common: use httplib + boringssl by default
build
devops
#18648 opened 2026-01-06 20:30 by
ngxson
[Do Not Merge] model : LFM2.5-Audio-1.5B
model
examples
python
server
#18641 opened 2026-01-06 14:25 by
tdakhran
Older