Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggerganov/llama.cpp
Pull Requests
Commits
Open
Closed
server: Improve get_datetime tool to return structured ISO-8601 JSON
examples
server
#22843 opened 2026-05-08 15:29 by
yamc12
ggml: fixed Arm SVE usage bug in vec.h, vec.cpp
ggml
#22841 opened 2026-05-08 14:36 by
martin-klacer-arm
spec : parallel drafting support
examples
server
#22838 opened 2026-05-08 12:44 by
ggerganov
ggml-cpu : add STQ1_0 ternary quantization with ARM NEON vec_dot kernel
examples
python
ggml
#22836 opened 2026-05-08 11:07 by
sjl623
convert_hf_to_gguf: fix Qwen3.5 linear_num_value_heads overridden by AutoConfig defaults
python
#22835 opened 2026-05-08 10:56 by
LucaSforza
ggml : add ggml_conv_1d_grouped
testing
ggml
#22833 opened 2026-05-08 10:15 by
Juste-Leo2
convert : add split() to LoraTorchTensor in LoRA converter
testing
python
#22832 opened 2026-05-08 08:42 by
jesus-talavera-ibm
webui: support video files as input
server/webui
examples
server
#22830 opened 2026-05-08 07:49 by
foldl
server: preserve context checkpoint coverage
examples
server
#22826 opened 2026-05-08 00:53 by
jacekpoplawski
Feature hexagon tri
ggml
Hexagon
#22822 opened 2026-05-07 20:48 by
pdhinaka
HIP: Adds 4x packed Q8_1 activation for Q4_K_M models in MMVQ
Nvidia GPU
ggml
#22821 opened 2026-05-07 20:45 by
jiachengjason
ggml-webgpu: address precision issues for multimodal
ggml
WebGPU
#22808 opened 2026-05-07 16:57 by
Constannnnnt
Add new config file options for saving and loading configuration for llama tools in INI format
#22802 opened 2026-05-07 13:04 by
bartowski1182
webui: page title use app name variable
server/webui
examples
server
#22801 opened 2026-05-07 12:58 by
jpm-canonical
mtmd-cli: load GPU backends before arg parsing to fix false 'no GPU' warning
examples
#22790 opened 2026-05-07 09:07 by
saga08003137
ggml: use dynamic allocation for split graph inputs
ggml
#22789 opened 2026-05-07 09:02 by
AgoraPete
spec : refactor ctx
examples
python
server
ggml
Apple Metal
#22787 opened 2026-05-07 06:55 by
ggerganov
convert : add `--fuse-qkv` flag to fuse Q/K/V into QKV during HF-to-GGUF conversion
model
python
#22780 opened 2026-05-07 02:24 by
JoursBleu
ggml-sycl : use malloc_shared for UMA/integrated GPU devices
ggml
SYCL
#22766 opened 2026-05-06 15:11 by
vmartirosyan
Draft: ggml-opencl: Early proof-of-concept implementation of plans via command buffers
ggml
OpenCL
#22764 opened 2026-05-06 14:20 by
jansol
android: extract GgufMetadataReader factory to break cyclic dependency
android
examples
#22763 opened 2026-05-06 14:12 by
Juste-Leo2
server: fix /infill prompt placement after FIM_MID
examples
server
#22761 opened 2026-05-06 13:30 by
Aayush7g
ggml-opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill
ggml
OpenCL
#22755 opened 2026-05-06 11:42 by
happyyzy
ggml-cpu: extend RVV quantization vec dot to higher VLENs
ggml
#22754 opened 2026-05-06 10:45 by
rehan-10xengineer
Add more wav-compatiable MIME types and enhance MIME type normalization
server/webui
examples
server
#22744 opened 2026-05-06 04:08 by
guangchenli
webui : [ChatFormActionAdd][a11y] fix accessibility issues in add menu trigger and items
server/webui
examples
server
#22736 opened 2026-05-06 01:41 by
vignesh191
llama : extend embeddings API
model
#22728 opened 2026-05-05 17:55 by
ggerganov
server, webui: support continue generation on reasoning models
server/webui
examples
server
#22727 opened 2026-05-05 17:15 by
ServeurpersoCom
Filter tools openai server task
examples
server
#22725 opened 2026-05-05 15:57 by
sonic182
Adding support for the granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2 models)
model
python
#22716 opened 2026-05-05 13:48 by
hansolosan
Newer
Older