Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
CUDA: Support CUDA Virtual Devices
ggml
CUDA
#25228 opened 2026-07-02 09:06 by
anavp-nvidia
server : don't list cached models when a preset is used
server
#25226 opened 2026-07-02 07:38 by
angt
[SYCL] Flash Attention with XMX engine via oneDNN graph API (SDPA) on KV f16; Qwen3.6-27b-Q8_0 prefill speed up x1.21 at p=512 and x4.26 at p=80k
ggml
SYCL
#25222 opened 2026-07-02 06:26 by
hmscider
common : add missing <fstream> include in common.h
#25220 opened 2026-07-02 05:39 by
zhangrunda
sycl: add fused top-k MoE
documentation
ggml
merge ready
SYCL
#25217 opened 2026-07-02 03:55 by
newjordan
hexagon: add VISION RoPE support
ggml
Hexagon
#25216 opened 2026-07-02 03:12 by
aparmp-quic
server: add --no-sleep flag for GPU heartbeat on headless GPUs
Vulkan
server
ggml
SYCL
CUDA
#25214 opened 2026-07-01 21:02 by
johnkarlhill
llama: optimize RWKV7 inference by fusing some graph operators
model
testing
Vulkan
ggml
SYCL
Apple Metal
CUDA
#25206 opened 2026-07-01 16:41 by
MollySophia
sycl: add GGML_SYCL_FATTN_VEC_NTHREADS build option
ggml
SYCL
#25205 opened 2026-07-01 15:51 by
Titaniumtown
llama: fix quantized kv-cache for dsv4
model
#25202 opened 2026-07-01 12:44 by
am17an
llama-cli: fix passing chat_template_kwargs and reasoning_format params
examples
#25201 opened 2026-07-01 12:41 by
percontation
ggml-cpu: Enable tiled matmul on AIX
ggml
#25199 opened 2026-07-01 12:17 by
shalinib-ibm
vulkan: disable async transfer queue on amdvlk (mitigate MoE partial-offload crash)
Vulkan
ggml
#25196 opened 2026-07-01 08:05 by
liminfei-amd
vulkan: Remove crash guard for Intel GPU
Vulkan
ggml
#25192 opened 2026-07-01 06:49 by
rillomas
openvino: fix SWA mask detection for long prompts
ggml
OpenVINO
#25189 opened 2026-07-01 02:56 by
zlma7001
CUDA/HIP: add Q2_0 (PrismML ternary 1.58-bit) support
ggml
CUDA
#25188 opened 2026-07-01 02:32 by
The-Monk
spec: add backend sampling for DFlash
#25180 opened 2026-06-30 18:41 by
ruixiang63
tests: Source-level separation between llama.cpp and ggml
testing
#25179 opened 2026-06-30 18:33 by
ckastner
metal: add col2im_1d op (f32/f16/bf16)
ggml
Apple Metal
#25176 opened 2026-06-30 17:44 by
ServeurpersoCom
spec: add DSpark speculative decoding
documentation
model
testing
conversion
#25173 opened 2026-06-30 13:55 by
wjinxu
grammar : recognize '|' at start of continuation line
testing
#25170 opened 2026-06-30 11:23 by
o7si
hexagon: allow dflash lm-head offload experiment
model
examples
ggml
Hexagon
#25166 opened 2026-06-30 09:43 by
Salanfeng
Add support for Laguna XS.2 & M.1
model
testing
ggml
CUDA
conversion
#25165 opened 2026-06-30 09:31 by
joerowell
ggml : fix wrong transpose function for int16 data
ggml
#25161 opened 2026-06-30 07:13 by
I3eg1nner
cuda: fix crash when querying memory on device with no free memory.
ggml
CUDA
#25157 opened 2026-06-30 03:43 by
cphlipot
ggml: imatrix-aware NVFP4 quantization (scale search) + wire NVFP4 ftype
examples
ggml
#25153 opened 2026-06-30 00:59 by
avifenesh
common, server : preserve HF file for cached models
server
#25152 opened 2026-06-29 23:44 by
mrexodia
CUDA: add COL2IM_1D op
documentation
ggml
CUDA
#25151 opened 2026-06-29 22:53 by
Ssamdeman
speculative: fix MTP draft crash on vision inputs
#25144 opened 2026-06-29 19:15 by
ServeurpersoCom
llama : add position-relocatable KV range save/load
testing
#25133 opened 2026-06-29 14:36 by
Anyesh
Newer
Older