Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
loader: map sparse mmap tensor ranges
#25309 opened 2026-07-04 20:59 by
Takinggg
json-schema-to-grammar: harden _visit_pattern against malformed regex patterns
#25308 opened 2026-07-04 19:09 by
professor-moody
ui: restore Ctrl+B sidebar toggle shortcut
server/ui
#25307 opened 2026-07-04 18:49 by
ServeurpersoCom
server-context: chain the caller-provided load_progress_callback instead of discarding it
server
#25306 opened 2026-07-04 18:28 by
mtmcp
cuda : concat implementation for quantized types
ggml
merge ready
CUDA
#25303 opened 2026-07-04 15:39 by
fairydreaming
ui: fake 200 for proxy DELETE req
server/ui
#25298 opened 2026-07-04 09:43 by
ngxson
kv-cache : fix SWA state save/load round-trip past n_swa
#25297 opened 2026-07-04 09:32 by
letruongthanh3698
llama : stream MoE routed experts from disk
#25294 opened 2026-07-04 04:51 by
freedomljc
scripts : use HF_TOKEN when downloading UI assets
#25280 opened 2026-07-03 18:17 by
angt
llama-batch: enable parallel sequences for partial rollback
model
#25278 opened 2026-07-03 17:50 by
am17an
common : resolve non-positive --threads to the number of math cores
documentation
testing
server
#25277 opened 2026-07-03 17:45 by
samagameditation-byte
ggml-backend-meta: abort if we see a multi buffer
ggml
#25276 opened 2026-07-03 17:05 by
netrunnereve
CANN: Refactor `#ifdef` blocks to avoid unreachable code after `return`
ggml
Ascend NPU
#25273 opened 2026-07-03 16:35 by
rauletorresc
ggml, server: add ggml_backend_dev_reset() for sleep mode
Vulkan
server
ggml
SYCL
Apple Metal
Ascend NPU
OpenCL
IBM zDNN
Hexagon
CUDA
AMD ZenDNN
OpenVINO
WebGPU
#25271 opened 2026-07-03 15:41 by
ngxson
[SYCL] support OP OPT_STEP_ADAMW, OPT_STEP_SGD
documentation
ggml
merge ready
SYCL
#25268 opened 2026-07-03 12:11 by
arthw
[SYC:] support op get_rows_back, only support fp32/fp16
documentation
ggml
SYCL
#25266 opened 2026-07-03 11:47 by
arthw
[SYCL] support op col2im_1d
documentation
ggml
SYCL
#25264 opened 2026-07-03 09:54 by
arthw
Kmoren/add penalties cu backend
testing
#25262 opened 2026-07-03 08:13 by
kmorennv
feat: add --threads-all option to llama-bench
examples
#25261 opened 2026-07-03 06:48 by
xiaobai0529
vulkan: fix 32-bit integer overflow in CEIL_DIV
Vulkan
ggml
#25245 opened 2026-07-02 17:23 by
hokanosekai
vulkan: for small AMD GPUs, reduce submission threshold based on CU count
Vulkan
ggml
#25240 opened 2026-07-02 12:42 by
0cc4m
common: Set optimal default thread count for ppc ( linux as well as AIX)
#25237 opened 2026-07-02 10:46 by
shalinib-ibm
[SYCL] support OP cross_entropy_loss, cross_entropy_loss_back
documentation
ggml
merge ready
SYCL
#25236 opened 2026-07-02 10:33 by
arthw
common,server : fix custom preset dedup against cached models
server
#25235 opened 2026-07-02 10:21 by
angt
[UT] enhance UT to show all real unsupported backends
testing
#25234 opened 2026-07-02 10:00 by
arthw
llama : clear error when MTP draft shares KV cache across backends
#25232 opened 2026-07-02 09:51 by
liminfei-amd
[SYCL] fix unsupported UT cases of CONT & CPY
documentation
ggml
merge ready
SYCL
#25231 opened 2026-07-02 09:47 by
arthw
Ensure unique node names and add org_src to track the org tensor for OpenVINO backend
testing
ggml
#25230 opened 2026-07-02 09:32 by
zhaixuejun1993
vulkan: when using transfer queue for async copies, sync on event_wait to avoid race
Vulkan
ggml
#25229 opened 2026-07-02 09:10 by
0cc4m
CUDA: Support CUDA Virtual Devices
ggml
CUDA
#25228 opened 2026-07-02 09:06 by
anavp-nvidia
Older