Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
common : add character class support to glob_match
#21111 opened 2026-03-28 09:32 by
CISC
ci : gracefully shut down the server
examples
python
server
#21110 opened 2026-03-28 09:05 by
angt
ci : add --reuse-port
examples
python
server
#21109 opened 2026-03-28 08:43 by
angt
server : fix processing of multiple back-to-back mtmd chunks
examples
server
#21107 opened 2026-03-28 08:16 by
ggerganov
convert: Add compressed-tensors NVFP4 conversion
python
#21095 opened 2026-03-28 04:09 by
michaelw9999
common : add reasoning_format = none support to gpt-oss
testing
#21094 opened 2026-03-28 03:50 by
aldehir
[SYCL] Enhance build script to use half cores to build, avoid OS hang
examples
SYCL
#21093 opened 2026-03-28 03:07 by
arthw
gguf: add big-endian magic for self-describing endianness
testing
python
ggml
#21092 opened 2026-03-28 02:27 by
Scottcjn
server/webui: cleanup dual representation approach, simplify to openai-compat
examples
server
#21090 opened 2026-03-28 00:06 by
pwilkin
ggml : add CPU TurboQuant KV cache types (TBQ3_0 / TBQ4_0)
testing
examples
server
ggml
#21089 opened 2026-03-27 23:59 by
elusznik
[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel
Nvidia GPU
ggml
#21086 opened 2026-03-27 22:14 by
gaugarg-nv
common: add bounds check in common_init_result::sampler to prevent segfault on failed model load
testing
examples
#21082 opened 2026-03-27 18:58 by
mtmcp
devops: SYCL: upgrade compute-runtime
devops
#21076 opened 2026-03-27 17:24 by
WizardlyBump17
fix cmake problem to exclude CCAN
need more info
ggml
Ascend NPU
#21075 opened 2026-03-27 16:59 by
sunqingn7
ggml-cuda: Add generic NVFP4 MMQ kernel
Nvidia GPU
ggml
#21074 opened 2026-03-27 16:50 by
michaelw9999
server: (webui) no more gzip compression
examples
server
#21073 opened 2026-03-27 16:16 by
ngxson
server: wrap headers for mcp proxy
examples
server
#21072 opened 2026-03-27 15:18 by
ngxson
hexagon: optimize HMX matmul operations
ggml
Hexagon
#21071 opened 2026-03-27 15:05 by
chraac
Add quantization recipes from custom recipe files
testing
examples
#21070 opened 2026-03-27 14:52 by
bartowski1182
ggml: allow prefetching tensor overrides
Nvidia GPU
examples
ggml
SYCL
Ascend NPU
OpenCL
IBM zDNN
OpenVINO
WebGPU
#21067 opened 2026-03-27 14:02 by
am17an
[HIP] Bump ROCm version to 7.2.1
devops
#21066 opened 2026-03-27 13:30 by
slojosic-amd
ggml : use 64 bytes aligned tile buffers
ggml
#21058 opened 2026-03-27 07:45 by
angt
ggml webgpu: update Vulkan backend CI to use self-hosted runner
devops
ggml
WebGPU
#21052 opened 2026-03-27 02:58 by
reeselevine
Add the tests that we want to run on external CI
script
python
devops
#21051 opened 2026-03-27 01:46 by
shreyajn
ggml webgpu: move quantized buffers to u32 types and some other changes for wider browser/device support
ggml
WebGPU
#21046 opened 2026-03-26 22:25 by
reeselevine
model: add Falcon OCR support
model
examples
python
ggml
Apple Metal
#21045 opened 2026-03-26 22:07 by
avirajBevli
llama : rotate activations for better quantization
#21038 opened 2026-03-26 18:14 by
ggerganov
webui: Add option to pre-encode conversation for faster next turns
examples
server
#21034 opened 2026-03-26 16:08 by
allozaur
model: add support for nvidia/gpt-oss-puzzle-88B
model
python
#21032 opened 2026-03-26 15:38 by
smpurkis
vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl
Vulkan
ggml
#21029 opened 2026-03-26 15:03 by
mkoker
Older