Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggml-org/llama.cpp
Pull Requests
Commits
Open
Closed
CUDA: fix should_use_mmvf for ne11 == 1
#17085 opened 2025-11-07 16:34 by
JohannesGaessler
convert: (demo) repacking compressed_tensor format of kimi-k2
python
#17083 opened 2025-11-07 15:36 by
ngxson
fixed a missing case for transposed copy
testing
Nvidia GPU
ggml
#17081 opened 2025-11-07 15:20 by
bssrdf
CUDA: skip fusion for repeating adds in bias
testing
Nvidia GPU
ggml
#17080 opened 2025-11-07 14:59 by
am17an
server : handle failures to restore host cache
examples
server
#17078 opened 2025-11-07 13:58 by
ggerganov
HIP: RDNA4 tensor core support for MMF
Nvidia GPU
ggml
#17077 opened 2025-11-07 13:37 by
zhang-hui-yulo
docs: fix typos in some files
#17074 opened 2025-11-07 12:09 by
khanhkhanhlele
arg: add --cache-list argument to list cached models
#17073 opened 2025-11-07 12:07 by
ngxson
[RFC] ggml: new backend for API Remoting
build
ggml
Apple Metal
#17072 opened 2025-11-07 11:15 by
kpouget
convert : handle compressed-tensors quant method
enhancement
python
#17069 opened 2025-11-07 03:03 by
compilade
Fix NetBSD compilation error
#17068 opened 2025-11-07 02:01 by
xinitrcn1
Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM
documentation
testing
Nvidia GPU
ggml
#17063 opened 2025-11-06 21:04 by
pwilkin
cmake: add option to build and link BoringSSL
build
#17062 opened 2025-11-06 20:11 by
angt
[WIP] s390x ci: debug build issue
devops
#17053 opened 2025-11-06 14:06 by
AlekseiNikiforovIBM
# Add Megrez-MoE Architecture Support ggml-org#16724
model
#17052 opened 2025-11-06 13:18 by
tamarPal
cuda: extended MMF_ROWS_PER_BLOCK
Nvidia GPU
ggml
#17051 opened 2025-11-06 13:07 by
zhang-hui-yulo
fix : Dangling pointer for non-empty trigger words in lazy grammar construction
#17048 opened 2025-11-06 10:24 by
marek-hradil
kv-cache : pad the cache size to 256 for performance
examples
python
server
#17046 opened 2025-11-06 08:17 by
ggerganov
Add MoE dynamic routing with expert caching
documentation
build
examples
#17044 opened 2025-11-06 05:11 by
jmangold23
ggml-hexagon: fix `test-backend-ops` failures on specific binary ops
ggml
#17042 opened 2025-11-06 02:09 by
chraac
server/public_simplechat alternate web client ui with 0 setup builtin tool calling++, reasoning - refactored, SysDateTime, rename pdftext
examples
python
server
#17038 opened 2025-11-05 23:32 by
hanishkvc
common: "Profile Guided Speculative Decoding"
#17034 opened 2025-11-05 18:46 by
jukofyork
CUDA: only use moe_expert_reduce when n_tokens=1
Nvidia GPU
ggml
#17032 opened 2025-11-05 17:08 by
am17an
ggml webgpu: faster matrix multiplication/matrix-vector multiplication
python
devops
ggml
#17031 opened 2025-11-05 17:02 by
reeselevine
ggml-cpu: handle 3d tensors in repack mat_mul
ggml
#17030 opened 2025-11-05 16:59 by
Alcpz
tests(test-backend-ops): Test backend ops verbosity
testing
#17029 opened 2025-11-05 16:57 by
gabe-l-hart
examples(eval-callback): Eval callback verbosity
examples
#17028 opened 2025-11-05 16:45 by
gabe-l-hart
vulkan: Fix test-thread-safety crashes
Vulkan
ggml
#17024 opened 2025-11-05 15:46 by
jeffbolznv
cuda/vulkan : bicubic interpolation
testing
Nvidia GPU
Vulkan
ggml
OpenCL
#17022 opened 2025-11-05 12:11 by
Acly
ci: add Arm-hosted Graviton4 runner
devops
#17021 opened 2025-11-05 11:57 by
sudhiarm
Older