Pull Requests ggml-org/llama.cpp

fix : Dangling pointer for non-empty trigger words in lazy grammar construction

#17048 opened 2025-11-06 10:24 by marek-hradil

kv-cache : pad the size of the small SWA cache for performance

#17046 opened 2025-11-06 08:17 by ggerganov

Add MoE dynamic routing with expert caching documentation build examples

#17044 opened 2025-11-06 05:11 by jmangold23

ggml-hexagon: fix `test-backend-ops` failures on specific binary ops ggml

#17042 opened 2025-11-06 02:09 by chraac

server/public_simplechat alternate web client ui with 0 setup builtin tool calling++, reasoning - refactored, SysDateTime, rename pdftext examples python server

#17038 opened 2025-11-05 23:32 by hanishkvc

common: "Profile Guided Speculative Decoding"

#17034 opened 2025-11-05 18:46 by jukofyork

CUDA: only use moe_expert_reduce when n_tokens=1 Nvidia GPU ggml

#17032 opened 2025-11-05 17:08 by am17an

ggml webgpu: faster matrix multiplication/matrix-vector multiplication python devops ggml

#17031 opened 2025-11-05 17:02 by reeselevine

ggml-cpu: handle 3d tensors in repack mat_mul ggml

#17030 opened 2025-11-05 16:59 by Alcpz

tests(test-backend-ops): Test backend ops verbosity testing

#17029 opened 2025-11-05 16:57 by gabe-l-hart

examples(eval-callback): Eval callback verbosity examples

#17028 opened 2025-11-05 16:45 by gabe-l-hart

vulkan: Fix test-thread-safety crashes Vulkan ggml

#17024 opened 2025-11-05 15:46 by jeffbolznv

cuda/vulkan : bicubic interpolation testing Nvidia GPU Vulkan ggml OpenCL

#17022 opened 2025-11-05 12:11 by Acly

ci: add Arm-hosted Graviton4 runner devops

#17021 opened 2025-11-05 11:57 by sudhiarm

memory: Hybrid context shift examples

#17009 opened 2025-11-04 20:52 by gabe-l-hart

webui: fix keyboard shortcuts for new chat & edit chat title examples server

#17007 opened 2025-11-04 18:38 by chansikpark

sampling : add support for GPU sampling (wip) testing Nvidia GPU ggml

#17004 opened 2025-11-04 17:34 by danbev

Q4/Q8 Tiled Gemm Optimization. ggml

#16999 opened 2025-11-04 13:48 by shalinib-ibm

kleidiai: add optimized per-channel kernels for Q8_0 ggml

#16993 opened 2025-11-04 10:07 by chaxu01

CUDA: add stream-based concurrency Nvidia GPU ggml

#16991 opened 2025-11-04 09:25 by am17an

CUDA: fix crash on uneven context testing Nvidia GPU ggml

#16988 opened 2025-11-04 07:54 by JohannesGaessler

Add circular tiling support to conv2d and pad, for Vulkan, CUDA, and CPU (used for making seamless textures) testing Nvidia GPU Vulkan ggml

#16985 opened 2025-11-04 00:22 by Phylliida

Mamba2 SSD model testing Nvidia GPU examples ggml Apple Metal

#16982 opened 2025-11-03 22:40 by gabe-l-hart

vulkan: Use spec constants for conv2d s/d/p and kernel W/H Vulkan ggml

#16978 opened 2025-11-03 20:42 by jeffbolznv

vulkan: fuse rms_norm + mul + rope (+ view + set_rows) testing Vulkan ggml

#16977 opened 2025-11-03 19:05 by jeffbolznv

webui: Add a "Continue" Action for Assistant Message examples server

#16971 opened 2025-11-03 15:14 by allozaur

sycl: flash-attention implementation ggml SYCL

#16969 opened 2025-11-03 13:11 by ye-NX

s390x: disable vxe for cross-compilation by default ggml

#16966 opened 2025-11-03 11:49 by AlekseiNikiforovIBM

Refactor llm_chat_template_from_str to avoid throwing exceptions

#16965 opened 2025-11-03 11:49 by AnonN10

CUDA: add implicit conv3d testing Nvidia GPU ggml

#16948 opened 2025-11-02 17:38 by bssrdf