Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggml-org/llama.cpp
Pull Requests
Commits
Open
Closed
Enhance run-bench.ps1 with path checks and error handling
script
#24636 opened 2026-06-15 05:45 by
Eamon2009
sycl: bound in-flight expert matmuls in mul_mat_id (fix MoE OUT_OF_RESOURCES on Intel iGPU)
ggml
SYCL
#24635 opened 2026-06-15 05:20 by
mayerwin
mtmd : fix output-buffer size for multi-image batches without temporal merge
examples
#24634 opened 2026-06-15 03:06 by
mayerwin
Reduce llama-quantize peak memory use by 2.34x
#24631 opened 2026-06-15 00:05 by
i386
convert : reorder V heads for LoraTorchTensor
python
#24627 opened 2026-06-14 21:22 by
javierdejesusda
fix(convert): avoid eager KeyError when hparams lacks top-level architectures
examples
python
server/ui
#24614 opened 2026-06-14 15:53 by
franitel
vulkan: support CONV_3D
testing
Vulkan
ggml
#24612 opened 2026-06-14 15:51 by
jeffbolznv
ui: provide touch accessible model selection UI
examples
server/ui
#24604 opened 2026-06-14 09:36 by
amoshydra
[SYCL] support OPs: conv_2d, conv_2d_dw, conv2d_transpose
documentation
examples
ggml
SYCL
#24600 opened 2026-06-14 04:30 by
arthw
ci: fix vulkan docker images
Vulkan
ggml
#24595 opened 2026-06-13 21:15 by
Kononnable
spec: support eagle3 for qwen3.5 & 3.6
model
examples
server
#24593 opened 2026-06-13 20:48 by
ruixiang63
hexagon: support for op-trace (fine-grain tracing of HVX/HMX/DMA events)
script
python
ggml
Hexagon
#24592 opened 2026-06-13 20:47 by
max-krasnyansky
llama : suppress misleading Gemma4Assistant error during memory fitting
#24590 opened 2026-06-13 19:26 by
leotm
HIP: use hipBLAS for dense prefill on gfx900, keep MMQ for MoE
Nvidia GPU
ggml
#24588 opened 2026-06-13 17:39 by
DEV-DUFORD
vulkan: add iq4_nl support back to FA
Vulkan
ggml
#24585 opened 2026-06-13 16:50 by
jeffbolznv
vulkan: support all backend tests for SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU/NORM
testing
Nvidia GPU
Vulkan
ggml
WebGPU
#24582 opened 2026-06-13 15:22 by
jeffbolznv
vulkan: Support gated_delta_net with S_v=16
Vulkan
ggml
#24581 opened 2026-06-13 15:09 by
jeffbolznv
vulkan: support more CONCAT types
testing
Vulkan
ggml
#24579 opened 2026-06-13 15:02 by
jeffbolznv
ui: (demo) access server remotely via webrtc
examples
server/ui
#24577 opened 2026-06-13 14:45 by
ngxson
ggml: optimize concat op by replacing per-element memcpy with row-level memcpy
ggml
#24575 opened 2026-06-13 14:07 by
sirohikartik
CI: Replace flake8-no-print with flake8-debug and pin repos to hashes
#24572 opened 2026-06-13 12:41 by
jpodivin
CUDA: Add conv3d.
Nvidia GPU
ggml
CUDA
#24569 opened 2026-06-13 11:24 by
Sero1000
EXPERIMENT: meta: key external view cache by backend context
ggml
#24566 opened 2026-06-13 08:45 by
nycdubliner
[fattn-tune] Add Blackwell MMA config
Nvidia GPU
ggml
#24565 opened 2026-06-13 07:04 by
yaohengxu
CUDA: don't route RDNA3.5 flash attention to the rocWMMA kernel
Nvidia GPU
ggml
#24562 opened 2026-06-13 03:39 by
liminfei-amd
CUDA/HIP: chunked MFMA prefill kernel for GATED_DELTA_NET (CDNA)
testing
Nvidia GPU
ggml
#24561 opened 2026-06-13 03:08 by
jadenmach2
ggml-alloc : check realloc result in alloc_tensor_range
ggml
#24559 opened 2026-06-13 02:03 by
ricku777-bear
Fix 24486: TP: allows the usage of 8,9,10 gpus for stepfun
#24554 opened 2026-06-13 01:18 by
krampenschiesser
llama: copy tensor_split at model load instead of retaining caller pointer, resolving segfault
#24552 opened 2026-06-13 01:07 by
dragonfyre13
llama : disable graph reuse when contexts share memory under SPLIT_MODE_TENSOR
#24549 opened 2026-06-12 23:52 by
nycdubliner
Older