Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
WebUI: New server loading page
examples
server
#18909 opened 2026-01-18 03:27 by
dariusjlukas
introduce legacy-torch flag for backward compatibility on older Intel…
python
#18908 opened 2026-01-18 02:50 by
csabakecskemeti
graph : utilize `ggml_build_forward_select()` to avoid reallocations
model
devops
#18898 opened 2026-01-17 14:09 by
ggerganov
HIP: add mmf for CDNA
Nvidia GPU
ggml
#18896 opened 2026-01-17 13:44 by
zhang-hui-yulo
llama: fix integer type consistency in split helpers
#18894 opened 2026-01-17 10:15 by
MaheshJakkala
examples: llama evaluation tool for mmlu, aime, gsm8k
examples
python
#18892 opened 2026-01-17 02:52 by
gatbontonpc
fit-params : Handle n_ctx 0 for models that entirely fit with n_ctx_train
#18890 opened 2026-01-17 01:26 by
65a
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860
ggml
#18888 opened 2026-01-16 23:42 by
Alcpz
DirectIO Model Loading: Extend and fix Fallback
#18887 opened 2026-01-16 23:11 by
JTischbein
llama : add MTP API
model
#18886 opened 2026-01-16 22:50 by
ngxson
gguf: display strerrno when cant load a model
ggml
#18884 opened 2026-01-16 21:20 by
teto
llama-bench: add global --seed and reduce per-token synchronization
examples
#18879 opened 2026-01-16 17:21 by
StanByriukov02
Metal : Supplement floor operator
ggml
Apple Metal
#18878 opened 2026-01-16 15:40 by
Old-cpu
Try fixing non-ASCII parameters in llama-cli on Windows
examples
#18872 opened 2026-01-16 00:20 by
forshtat
opencl: add optimized q8_0 mm kernel for adreno
ggml
OpenCL
#18871 opened 2026-01-15 23:01 by
shaofeiqi
convert_hf_to_gguf.py: refactor modify_tensors to call super
python
#18866 opened 2026-01-15 15:01 by
am17an
sampling : update outdated comment about has_sampled [no ci]
#18863 opened 2026-01-15 13:04 by
danbev
sampling : add support for saving/loading backend sampling state
testing
#18862 opened 2026-01-15 12:26 by
danbev
wasm, tests: fix ctests with emscripten
build
testing
ggml
#18861 opened 2026-01-15 12:24 by
aviallon
ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm)
ggml
#18860 opened 2026-01-15 10:58 by
Alcpz
ggml-cpu: add RVV vec dot kernels for quantization types
ggml
#18859 opened 2026-01-15 10:08 by
rehan-10xengineer
ggml-cpu: add q4_0 repack support for wasm
ggml
#18858 opened 2026-01-15 09:59 by
aviallon
enforce response_format and json_schema for Kimi K2
testing
#18851 opened 2026-01-15 03:01 by
akoumjian
Deepseek v3.2 dense attention support from @fairydreaming
python
#18849 opened 2026-01-14 22:13 by
createthis
# [RFC] Integrate sparse-ternary-fma for TQ2_0 quantization
testing
ggml
#18836 opened 2026-01-14 10:44 by
HyperFoldUK
vulkan: Revert forced full subgroup for FlashAttention
Vulkan
ggml
#18831 opened 2026-01-14 08:38 by
rillomas
model: Add PaddleOCR-VL model support
model
examples
python
#18825 opened 2026-01-14 06:15 by
megemini
ggml-backend: Separate dynamic lib install and search paths, add relative search
ggml
#18817 opened 2026-01-13 20:01 by
DaAwesomeP
HIP: tune mmq/rocblas switching for RDNA4
Nvidia GPU
ggml
#18816 opened 2026-01-13 16:19 by
jiachengjason
sampling : remove sampling branching in output_reserve
#18811 opened 2026-01-13 15:10 by
danbev
Older