Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
CI: Fix docker multiarch overwrite
devops
#21144 opened 2026-03-29 07:09 by
Ts-sound
common: add two-phase graceful reasoning budget termination ...
#21141 opened 2026-03-29 03:03 by
zeel2104
grammar: make MAX_REPETITION_THRESHOLD configurable via env var
#21139 opened 2026-03-29 01:12 by
vampyrebat
Multi-backend profiler
Nvidia GPU
Vulkan
examples
python
ggml
SYCL
Ascend NPU
OpenCL
Hexagon
OpenVINO
#21138 opened 2026-03-29 00:21 by
pwilkin
hexagon: dma optimizations (mostly fixing regressions)
ggml
Hexagon
#21137 opened 2026-03-28 23:48 by
max-krasnyansky
CI: Enable CUDA ARM64 runners
documentation
devops
#21122 opened 2026-03-28 15:26 by
ehfd
metal: add opt-in V skip for negligible attention weights
ggml
Apple Metal
#21119 opened 2026-03-28 13:09 by
TheTom
convert: Add compressed-tensors NVFP4 conversion
python
#21095 opened 2026-03-28 04:09 by
michaelw9999
server/webui: cleanup dual representation approach, simplify to openai-compat
examples
server
#21090 opened 2026-03-28 00:06 by
pwilkin
ggml : add CPU TurboQuant KV cache types (TBQ3_0 / TBQ4_0)
testing
examples
server
ggml
#21089 opened 2026-03-27 23:59 by
elusznik
[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel
Nvidia GPU
ggml
#21086 opened 2026-03-27 22:14 by
gaugarg-nv
common: add bounds check in common_init_result::sampler to prevent segfault on failed model load
testing
examples
#21082 opened 2026-03-27 18:58 by
mtmcp
fix cmake problem to exclude CCAN
need more info
ggml
Ascend NPU
#21075 opened 2026-03-27 16:59 by
sunqingn7
ggml-cuda: Add generic NVFP4 MMQ kernel
Nvidia GPU
python
ggml
#21074 opened 2026-03-27 16:50 by
michaelw9999
server: (webui) no more gzip compression
server/webui
examples
server
#21073 opened 2026-03-27 16:16 by
ngxson
server: wrap headers for mcp proxy
examples
server
#21072 opened 2026-03-27 15:18 by
ngxson
hexagon: optimize HMX matmul operations
ggml
Hexagon
#21071 opened 2026-03-27 15:05 by
chraac
Add quantization recipes from custom recipe files
testing
examples
#21070 opened 2026-03-27 14:52 by
bartowski1182
ggml: allow prefetching tensor overrides
Nvidia GPU
examples
ggml
SYCL
Ascend NPU
OpenCL
IBM zDNN
OpenVINO
WebGPU
#21067 opened 2026-03-27 14:02 by
am17an
[HIP] Bump ROCm version to 7.2.1
devops
#21066 opened 2026-03-27 13:30 by
slojosic-amd
ggml : use 64 bytes aligned tile buffers
ggml
#21058 opened 2026-03-27 07:45 by
angt
ggml webgpu: update Vulkan backend CI to use self-hosted runner
devops
ggml
WebGPU
#21052 opened 2026-03-27 02:58 by
reeselevine
Add the tests that we want to run on external CI
script
python
devops
#21051 opened 2026-03-27 01:46 by
shreyajn
ggml webgpu: move quantized buffers to u32 types and some other changes for wider browser/device support
ggml
WebGPU
#21046 opened 2026-03-26 22:25 by
reeselevine
model: add Falcon OCR support
model
examples
python
ggml
Apple Metal
#21045 opened 2026-03-26 22:07 by
avirajBevli
llama : rotate activations for better quantization
#21038 opened 2026-03-26 18:14 by
ggerganov
webui: Add option to pre-encode conversation for faster next turns
examples
server
#21034 opened 2026-03-26 16:08 by
allozaur
model: add support for nvidia/gpt-oss-puzzle-88B
model
python
#21032 opened 2026-03-26 15:38 by
smpurkis
vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl
Vulkan
ggml
#21029 opened 2026-03-26 15:03 by
mkoker
Vulkan Q4_0 Repack PoC
Vulkan
ggml
#21024 opened 2026-03-26 12:03 by
0cc4m
Older