Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggml-org/llama.cpp
Pull Requests
Commits
Open
Closed
Use fp32 in cuBLAS V100 to avoid overflows, env variables to override cuBLAS compute type
Nvidia GPU
ggml
#19959 opened 2026-02-27 19:25 by
wallentri88
llama : add native param2moe architecture support
model
python
#19958 opened 2026-02-27 19:01 by
iambhuvan
tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice
examples
#19954 opened 2026-02-27 14:37 by
angt
scripts : improve get-wikitext-2.sh
script
#19952 opened 2026-02-27 13:19 by
angt
[New quant] Q3_PT
examples
python
ggml
#19941 opened 2026-02-26 22:15 by
pwilkin
webui: use date in more human readable exported filename
examples
server
#19939 opened 2026-02-26 20:15 by
woof-dog
scripts: ini_to_opencode.py
script
python
#19938 opened 2026-02-26 18:13 by
am17an
fix dots.ocr: correct RoPE sections and FFN tensor mapping
examples
python
#19936 opened 2026-02-26 16:27 by
anthony-maio
common : update completion executables list [no ci]
#19934 opened 2026-02-26 14:12 by
danbev
tool parser: add GigaChatV3/3.1 models support in PEG format
testing
#19931 opened 2026-02-26 13:07 by
Mishusha
metal: add CONV_3D
ggml
Apple Metal
#19927 opened 2026-02-26 11:24 by
Ra5hidIslam
llama/ggml: multi-GPU pipeline parallelism (xdev host staging) + faster model loading
Nvidia GPU
ggml
#19922 opened 2026-02-26 09:43 by
mxxm-t
ggml-cuda: add mem check for fusion
Nvidia GPU
ggml
#19916 opened 2026-02-26 05:53 by
am17an
vendors: update miniaudio library to 0.11.24
script
python
#19914 opened 2026-02-26 04:37 by
data-man
test-backend-ops: allow loading tests from JSON and parsing model operators into JSON
testing
examples
#19896 opened 2026-02-25 15:13 by
0cc4m
[ggml-quants] Add memsets and other fixes for IQ quants
ggml
#19861 opened 2026-02-24 20:04 by
bartowski1182
server : add default-model preset and fallback logic
examples
server
#19855 opened 2026-02-24 16:30 by
mikhail-shevtsov-wiregate
ggml-webgpu: Support non-contiguous `src0` and overlapping `src0/src1` in binary ops
testing
ggml
#19850 opened 2026-02-24 12:20 by
yomaytk
server : add chat truncation to keep chat going
testing
examples
python
server
#19841 opened 2026-02-23 21:36 by
ltoniazzi
opencl: add optimized q4_1 mm kernel for adreno
ggml
OpenCL
#19840 opened 2026-02-23 21:36 by
shaofeiqi
sampling : support multiple outputs per sequence
testing
examples
server
#19833 opened 2026-02-23 13:51 by
danbev
Add Aya 101 multi-lingual translation support to llama.cpp
examples
python
server
#19832 opened 2026-02-23 13:23 by
Acceldium
common : refactor cache to use hierarchical directory layout
#19828 opened 2026-02-23 12:30 by
angt
Kimi Linear block implementation
model
#19827 opened 2026-02-23 12:14 by
ymcki
implemented max pooling for embeddings
examples
python
server
#19812 opened 2026-02-22 19:03 by
lorenzocesconetto
llama: end-to-end tests
model
testing
#19802 opened 2026-02-22 10:59 by
JohannesGaessler
Add model metadata loading from huggingface for use with tests requiring real model data
testing
#19796 opened 2026-02-22 05:07 by
bartowski1182
tools : add learning-cache tool for persistent latent context
examples
#19791 opened 2026-02-22 00:55 by
arkavo-com
[WIP] ggml-hexagon: convert f32 to f16 - fa opt part4
ggml
#19780 opened 2026-02-21 15:13 by
chraac
Clean up per-thread parameter buffer pool and job submission logic
ggml
#19772 opened 2026-02-20 23:27 by
nikhilJain17
Older