Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
Add cohere2moe to llama-vocab for TINY_AYA
#24601 opened 2026-06-14 04:37 by
bartowski1182
[SYCL] support OPs: conv_2d, conv_2d_dw, conv2d_transpose
documentation
examples
ggml
SYCL
#24600 opened 2026-06-14 04:30 by
arthw
server: add system message prefix feature
examples
python
server
#24599 opened 2026-06-14 02:15 by
coder543
ci: fix vulkan docker images
Vulkan
ggml
#24595 opened 2026-06-13 21:15 by
Kononnable
ci : use CUDA label for cuda backend
devops
#24594 opened 2026-06-13 20:58 by
CISC
spec: support eagle3 for qwen3.5 & 3.6
model
examples
server
#24593 opened 2026-06-13 20:48 by
ruixiang63
hexagon: support for op-trace (fine-grain tracing of HVX/HMX/DMA events)
script
python
ggml
Hexagon
#24592 opened 2026-06-13 20:47 by
max-krasnyansky
llama : suppress misleading Gemma4Assistant error during memory fitting
#24590 opened 2026-06-13 19:26 by
leotm
HIP: use hipBLAS for dense prefill on gfx900, keep MMQ for MoE
Nvidia GPU
ggml
#24588 opened 2026-06-13 17:39 by
DEV-DUFORD
vulkan: add iq4_nl support back to FA
Vulkan
ggml
#24585 opened 2026-06-13 16:50 by
jeffbolznv
[SYCL] add to support pool_1d, move pool_1d/2d code to pool.cpp/hpp
documentation
ggml
merge ready
SYCL
#24584 opened 2026-06-13 16:23 by
arthw
vulkan: support all backend tests for SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU/NORM
testing
Nvidia GPU
Vulkan
ggml
WebGPU
#24582 opened 2026-06-13 15:22 by
jeffbolznv
vulkan: Support gated_delta_net with S_v=16
Vulkan
ggml
#24581 opened 2026-06-13 15:09 by
jeffbolznv
vulkan: support more CONCAT types
testing
Vulkan
ggml
#24579 opened 2026-06-13 15:02 by
jeffbolznv
[SYCL]fix reorder function crash:GGML_ASSERT(block_num_y % num_subgroups ==0)
examples
ggml
merge ready
SYCL
#24578 opened 2026-06-13 14:50 by
arthw
ui: (demo) access server remotely via webrtc
examples
server/ui
#24577 opened 2026-06-13 14:45 by
ngxson
ggml: optimize concat op by replacing per-element memcpy with row-level memcpy
ggml
#24575 opened 2026-06-13 14:07 by
sirohikartik
CI: Replace flake8-no-print with flake8-debug and pin repos to hashes
#24572 opened 2026-06-13 12:41 by
jpodivin
CUDA: Add conv3d.
Nvidia GPU
ggml
#24569 opened 2026-06-13 11:24 by
Sero1000
EXPERIMENT: meta: key external view cache by backend context
ggml
#24566 opened 2026-06-13 08:45 by
nycdubliner
[fattn-tune] Add Blackwell MMA config
Nvidia GPU
ggml
#24565 opened 2026-06-13 07:04 by
yaohengxu
[SYCL] Enhance set_rows to support q1_0, mxfp4, nvfp4
documentation
ggml
merge ready
SYCL
#24564 opened 2026-06-13 06:48 by
arthw
CUDA: don't route RDNA3.5 flash attention to the rocWMMA kernel
Nvidia GPU
ggml
#24562 opened 2026-06-13 03:39 by
liminfei-amd
CUDA/HIP: chunked MFMA prefill kernel for GATED_DELTA_NET (CDNA)
testing
Nvidia GPU
ggml
#24561 opened 2026-06-13 03:08 by
jadenmach2
ggml-alloc : check realloc result in alloc_tensor_range
ggml
#24559 opened 2026-06-13 02:03 by
ricku777-bear
Fix 24486: TP: allows the usage of 8,9,10 gpus for stepfun
#24554 opened 2026-06-13 01:18 by
krampenschiesser
llama: copy tensor_split at model load instead of retaining caller pointer, resolving segfault
#24552 opened 2026-06-13 01:07 by
dragonfyre13
llama : disable graph reuse when contexts share memory under SPLIT_MODE_TENSOR
#24549 opened 2026-06-12 23:52 by
nycdubliner
Reduce RSS during BF16 GGUF export
python
#24548 opened 2026-06-12 23:11 by
i386
ggml-cuda: use universal launch bounds for MoE MMVQ kernel
Nvidia GPU
ggml
#24547 opened 2026-06-12 23:05 by
batot1
Older