Pull Requests ggml-org/llama.cpp

CUDA: fix should_use_mmvf for ne11 == 1

#17085 opened 2025-11-07 16:34 by JohannesGaessler

convert: (demo) repacking compressed_tensor format of kimi-k2 python

#17083 opened 2025-11-07 15:36 by ngxson

fixed a missing case for transposed copy testing Nvidia GPU ggml

#17081 opened 2025-11-07 15:20 by bssrdf

CUDA: skip fusion for repeating adds in bias testing Nvidia GPU ggml

#17080 opened 2025-11-07 14:59 by am17an

server : handle failures to restore host cache examples server

#17078 opened 2025-11-07 13:58 by ggerganov

HIP: RDNA4 tensor core support for MMF Nvidia GPU ggml

#17077 opened 2025-11-07 13:37 by zhang-hui-yulo

docs: fix typos in some files

#17074 opened 2025-11-07 12:09 by khanhkhanhlele

arg: add --cache-list argument to list cached models

#17073 opened 2025-11-07 12:07 by ngxson

[RFC] ggml: new backend for API Remoting build ggml Apple Metal

#17072 opened 2025-11-07 11:15 by kpouget

convert : handle compressed-tensors quant method enhancement python

#17069 opened 2025-11-07 03:03 by compilade

Fix NetBSD compilation error

#17068 opened 2025-11-07 02:01 by xinitrcn1

Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM documentation testing Nvidia GPU ggml

#17063 opened 2025-11-06 21:04 by pwilkin

cmake: add option to build and link BoringSSL build

#17062 opened 2025-11-06 20:11 by angt

[WIP] s390x ci: debug build issue devops

#17053 opened 2025-11-06 14:06 by AlekseiNikiforovIBM

# Add Megrez-MoE Architecture Support ggml-org#16724 model

#17052 opened 2025-11-06 13:18 by tamarPal

cuda: extended MMF_ROWS_PER_BLOCK Nvidia GPU ggml

#17051 opened 2025-11-06 13:07 by zhang-hui-yulo

fix : Dangling pointer for non-empty trigger words in lazy grammar construction

#17048 opened 2025-11-06 10:24 by marek-hradil

kv-cache : pad the cache size to 256 for performance examples python server

#17046 opened 2025-11-06 08:17 by ggerganov

Add MoE dynamic routing with expert caching documentation build examples

#17044 opened 2025-11-06 05:11 by jmangold23

ggml-hexagon: fix `test-backend-ops` failures on specific binary ops ggml

#17042 opened 2025-11-06 02:09 by chraac

server/public_simplechat alternate web client ui with 0 setup builtin tool calling++, reasoning - refactored, SysDateTime, rename pdftext examples python server

#17038 opened 2025-11-05 23:32 by hanishkvc

common: "Profile Guided Speculative Decoding"

#17034 opened 2025-11-05 18:46 by jukofyork

CUDA: only use moe_expert_reduce when n_tokens=1 Nvidia GPU ggml

#17032 opened 2025-11-05 17:08 by am17an

ggml webgpu: faster matrix multiplication/matrix-vector multiplication python devops ggml

#17031 opened 2025-11-05 17:02 by reeselevine

ggml-cpu: handle 3d tensors in repack mat_mul ggml

#17030 opened 2025-11-05 16:59 by Alcpz

tests(test-backend-ops): Test backend ops verbosity testing

#17029 opened 2025-11-05 16:57 by gabe-l-hart

examples(eval-callback): Eval callback verbosity examples

#17028 opened 2025-11-05 16:45 by gabe-l-hart

vulkan: Fix test-thread-safety crashes Vulkan ggml

#17024 opened 2025-11-05 15:46 by jeffbolznv

cuda/vulkan : bicubic interpolation testing Nvidia GPU Vulkan ggml OpenCL

#17022 opened 2025-11-05 12:11 by Acly

ci: add Arm-hosted Graviton4 runner devops

#17021 opened 2025-11-05 11:57 by sudhiarm