Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
fix(quantize): add NVFP4 default type mapping and scale tensors
examples
#22897 opened 2026-05-10 08:34 by
t-timms
[ggml] Fix Vulkan-Hpp handle usage on 32-bit targets.
Vulkan
ggml
#22892 opened 2026-05-10 04:06 by
miyanyan
vendor : update cpp-httplib to 0.43.4
script
python
merge ready
#22888 opened 2026-05-09 22:37 by
cabelo
vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32
Vulkan
ggml
#22887 opened 2026-05-09 22:10 by
TheBlueMatt
feat: add MiMo v2.5 vision
examples
python
#22883 opened 2026-05-09 20:52 by
AesSedai
HIP: RDNA3 mma FA, faster AMD transpose, tune AMD
Nvidia GPU
ggml
#22880 opened 2026-05-09 19:43 by
JohannesGaessler
docs: fix metrics endpoint description in server README
examples
server
#22879 opened 2026-05-09 18:25 by
willjoha
Optimise memory usage by evicting weights after processing each layer
examples
#22877 opened 2026-05-09 17:34 by
EAddario
opencl: fix crash when warming up MoE on Adreno
ggml
OpenCL
#22876 opened 2026-05-09 17:23 by
lhez
cli: exit conversation mode on stdin EOF
examples
#22874 opened 2026-05-09 16:54 by
YuWei-CH
server : fix n_predict=-2 (generate until context full)
examples
python
server
#22873 opened 2026-05-09 16:13 by
kimjune01
ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend
documentation
build
devops
ggml
#22863 opened 2026-05-09 06:47 by
alex-spacemit
SYCL: implement ggml_sycl_pool_vmm
ggml
SYCL
#22862 opened 2026-05-09 06:40 by
sanmai
ggml-cpu: scope KleidiAI compile flags per-target via OBJECT library
ggml
#22861 opened 2026-05-09 05:36 by
shreyanshp
security: fix critical integer overflow (CWE-190) in tensor allocation
ggml
#22857 opened 2026-05-08 23:45 by
programacionlogicT900r1000
opencl: add q4_1 MoE for Adreno
ggml
OpenCL
#22856 opened 2026-05-08 22:47 by
shawngu-quic
vulkan: fuse snake activation (mul, sin, sqr, mul, add)
Vulkan
ggml
#22855 opened 2026-05-08 21:41 by
ServeurpersoCom
Add --continue-after-failure support to llama-bench for resilient benchmark sweeps
examples
#22854 opened 2026-05-08 21:29 by
ssam18
docs: fix suggested cmake flag in `build.md` for including rocwmma
documentation
#22852 opened 2026-05-08 20:48 by
Elliot-Roberts
Add preemptive priority scheduling
server/webui
examples
python
server
#22851 opened 2026-05-08 20:11 by
lowlyocean
webui: Add max image size option
server/webui
examples
server
#22849 opened 2026-05-08 19:06 by
stduhpf
cli: Add quiet mode
examples
#22848 opened 2026-05-08 19:02 by
nh2
test-backend-ops: add more fields to csv output.
testing
#22844 opened 2026-05-08 16:13 by
Exile333
server: Improve get_datetime tool to return structured ISO-8601 JSON
examples
server
#22843 opened 2026-05-08 15:29 by
yamc12
ggml: fixed Arm SVE usage bug in vec.h, vec.cpp
ggml
#22841 opened 2026-05-08 14:36 by
martin-klacer-arm
spec : parallel drafting support
examples
server
#22838 opened 2026-05-08 12:44 by
ggerganov
ggml-cpu : add STQ1_0 ternary quantization with ARM NEON vec_dot kernel
examples
python
ggml
#22836 opened 2026-05-08 11:07 by
sjl623
convert_hf_to_gguf: fix Qwen3.5 linear_num_value_heads overridden by AutoConfig defaults
python
#22835 opened 2026-05-08 10:56 by
LucaSforza
ggml : add ggml_conv_1d_grouped
testing
ggml
#22833 opened 2026-05-08 10:15 by
Juste-Leo2
convert : add split() to LoraTorchTensor in LoRA converter
testing
python
#22832 opened 2026-05-08 08:42 by
jesus-talavera-ibm
Older