Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggml-org/llama.cpp
Pull Requests
Commits
Open
Closed
fix : Dangling pointer for non-empty trigger words in lazy grammar construction
#17048 opened 2025-11-06 10:24 by
marek-hradil
kv-cache : pad the size of the small SWA cache for performance
#17046 opened 2025-11-06 08:17 by
ggerganov
Add MoE dynamic routing with expert caching
documentation
build
examples
#17044 opened 2025-11-06 05:11 by
jmangold23
ggml-hexagon: fix `test-backend-ops` failures on specific binary ops
ggml
#17042 opened 2025-11-06 02:09 by
chraac
server/public_simplechat alternate web client ui with 0 setup builtin tool calling++, reasoning - refactored, SysDateTime, rename pdftext
examples
python
server
#17038 opened 2025-11-05 23:32 by
hanishkvc
common: "Profile Guided Speculative Decoding"
#17034 opened 2025-11-05 18:46 by
jukofyork
CUDA: only use moe_expert_reduce when n_tokens=1
Nvidia GPU
ggml
#17032 opened 2025-11-05 17:08 by
am17an
ggml webgpu: faster matrix multiplication/matrix-vector multiplication
python
devops
ggml
#17031 opened 2025-11-05 17:02 by
reeselevine
ggml-cpu: handle 3d tensors in repack mat_mul
ggml
#17030 opened 2025-11-05 16:59 by
Alcpz
tests(test-backend-ops): Test backend ops verbosity
testing
#17029 opened 2025-11-05 16:57 by
gabe-l-hart
examples(eval-callback): Eval callback verbosity
examples
#17028 opened 2025-11-05 16:45 by
gabe-l-hart
vulkan: Fix test-thread-safety crashes
Vulkan
ggml
#17024 opened 2025-11-05 15:46 by
jeffbolznv
cuda/vulkan : bicubic interpolation
testing
Nvidia GPU
Vulkan
ggml
OpenCL
#17022 opened 2025-11-05 12:11 by
Acly
ci: add Arm-hosted Graviton4 runner
devops
#17021 opened 2025-11-05 11:57 by
sudhiarm
memory: Hybrid context shift
examples
#17009 opened 2025-11-04 20:52 by
gabe-l-hart
webui: fix keyboard shortcuts for new chat & edit chat title
examples
server
#17007 opened 2025-11-04 18:38 by
chansikpark
sampling : add support for GPU sampling (wip)
testing
Nvidia GPU
ggml
#17004 opened 2025-11-04 17:34 by
danbev
Q4/Q8 Tiled Gemm Optimization.
ggml
#16999 opened 2025-11-04 13:48 by
shalinib-ibm
kleidiai: add optimized per-channel kernels for Q8_0
ggml
#16993 opened 2025-11-04 10:07 by
chaxu01
CUDA: add stream-based concurrency
Nvidia GPU
ggml
#16991 opened 2025-11-04 09:25 by
am17an
CUDA: fix crash on uneven context
testing
Nvidia GPU
ggml
#16988 opened 2025-11-04 07:54 by
JohannesGaessler
Add circular tiling support to conv2d and pad, for Vulkan, CUDA, and CPU (used for making seamless textures)
testing
Nvidia GPU
Vulkan
ggml
#16985 opened 2025-11-04 00:22 by
Phylliida
Mamba2 SSD
model
testing
Nvidia GPU
examples
ggml
Apple Metal
#16982 opened 2025-11-03 22:40 by
gabe-l-hart
vulkan: Use spec constants for conv2d s/d/p and kernel W/H
Vulkan
ggml
#16978 opened 2025-11-03 20:42 by
jeffbolznv
vulkan: fuse rms_norm + mul + rope (+ view + set_rows)
testing
Vulkan
ggml
#16977 opened 2025-11-03 19:05 by
jeffbolznv
webui: Add a "Continue" Action for Assistant Message
examples
server
#16971 opened 2025-11-03 15:14 by
allozaur
sycl: flash-attention implementation
ggml
SYCL
#16969 opened 2025-11-03 13:11 by
ye-NX
s390x: disable vxe for cross-compilation by default
ggml
#16966 opened 2025-11-03 11:49 by
AlekseiNikiforovIBM
Refactor llm_chat_template_from_str to avoid throwing exceptions
#16965 opened 2025-11-03 11:49 by
AnonN10
CUDA: add implicit conv3d
testing
Nvidia GPU
ggml
#16948 opened 2025-11-02 17:38 by
bssrdf
Older