Pull Requests ggerganov/llama.cpp

WebUI: New server loading page examples server

#18909 opened 2026-01-18 03:27 by dariusjlukas

introduce legacy-torch flag for backward compatibility on older Intel… python

#18908 opened 2026-01-18 02:50 by csabakecskemeti

graph : utilize `ggml_build_forward_select()` to avoid reallocations model devops

#18898 opened 2026-01-17 14:09 by ggerganov

HIP: add mmf for CDNA Nvidia GPU ggml

#18896 opened 2026-01-17 13:44 by zhang-hui-yulo

llama: fix integer type consistency in split helpers

#18894 opened 2026-01-17 10:15 by MaheshJakkala

examples: llama evaluation tool for mmlu, aime, gsm8k examples python

#18892 opened 2026-01-17 02:52 by gatbontonpc

fit-params : Handle n_ctx 0 for models that entirely fit with n_ctx_train

#18890 opened 2026-01-17 01:26 by 65a

ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 ggml

#18888 opened 2026-01-16 23:42 by Alcpz

DirectIO Model Loading: Extend and fix Fallback

#18887 opened 2026-01-16 23:11 by JTischbein

llama : add MTP API model

#18886 opened 2026-01-16 22:50 by ngxson

gguf: display strerrno when cant load a model ggml

#18884 opened 2026-01-16 21:20 by teto

llama-bench: add global --seed and reduce per-token synchronization examples

#18879 opened 2026-01-16 17:21 by StanByriukov02

Metal : Supplement floor operator ggml Apple Metal

#18878 opened 2026-01-16 15:40 by Old-cpu

Try fixing non-ASCII parameters in llama-cli on Windows examples

#18872 opened 2026-01-16 00:20 by forshtat

opencl: add optimized q8_0 mm kernel for adreno ggml OpenCL

#18871 opened 2026-01-15 23:01 by shaofeiqi

convert_hf_to_gguf.py: refactor modify_tensors to call super python

#18866 opened 2026-01-15 15:01 by am17an

sampling : update outdated comment about has_sampled [no ci]

#18863 opened 2026-01-15 13:04 by danbev

sampling : add support for saving/loading backend sampling state testing

#18862 opened 2026-01-15 12:26 by danbev

wasm, tests: fix ctests with emscripten build testing ggml

#18861 opened 2026-01-15 12:24 by aviallon

ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) ggml

#18860 opened 2026-01-15 10:58 by Alcpz

ggml-cpu: add RVV vec dot kernels for quantization types ggml

#18859 opened 2026-01-15 10:08 by rehan-10xengineer

ggml-cpu: add q4_0 repack support for wasm ggml

#18858 opened 2026-01-15 09:59 by aviallon

enforce response_format and json_schema for Kimi K2 testing

#18851 opened 2026-01-15 03:01 by akoumjian

Deepseek v3.2 dense attention support from @fairydreaming python

#18849 opened 2026-01-14 22:13 by createthis

# [RFC] Integrate sparse-ternary-fma for TQ2_0 quantization testing ggml

#18836 opened 2026-01-14 10:44 by HyperFoldUK

vulkan: Revert forced full subgroup for FlashAttention Vulkan ggml

#18831 opened 2026-01-14 08:38 by rillomas

model: Add PaddleOCR-VL model support model examples python

#18825 opened 2026-01-14 06:15 by megemini

ggml-backend: Separate dynamic lib install and search paths, add relative search ggml

#18817 opened 2026-01-13 20:01 by DaAwesomeP

HIP: tune mmq/rocblas switching for RDNA4 Nvidia GPU ggml

#18816 opened 2026-01-13 16:19 by jiachengjason

sampling : remove sampling branching in output_reserve

#18811 opened 2026-01-13 15:10 by danbev