Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggml-org/llama.cpp
Pull Requests
Commits
Open
Closed
Added note for compiling on integrated GPUs
documentation
#18633 opened 2026-01-06 04:58 by
alosslessdev
vulkan: optimize ssm_scan
Vulkan
ggml
#18630 opened 2026-01-05 22:39 by
jeffbolznv
ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten.
ggml
#18628 opened 2026-01-05 16:28 by
yomaytk
rpc : implement event and async backend APIs
ggml
#18626 opened 2026-01-05 15:11 by
rgerganov
CANN: Remove unused functions
ggml
Ascend NPU
#18625 opened 2026-01-05 15:10 by
rauletorresc
CANN: Rename `get_env` to `get_env_as_lowercase`
ggml
Ascend NPU
#18624 opened 2026-01-05 14:50 by
rauletorresc
sampling: add tail-free (TFS) sampling
#18612 opened 2026-01-05 06:45 by
viralvgupta
Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul
ggml
#18611 opened 2026-01-05 06:06 by
max-krasnyansky
ggml webgpu: initial flashattention implementation
ggml
#18610 opened 2026-01-05 05:37 by
reeselevine
Fix grammar parsing issues to prevent stack overflow and hangs
testing
#18604 opened 2026-01-05 02:16 by
aagit
common: build as shared library when BUILD_SHARED_LIBS is ON
#18602 opened 2026-01-05 00:11 by
rsauciuc
memory : add llama_memory_hybrid_iswa
#18601 opened 2026-01-04 23:23 by
tdakhran
server: add group support for router mode model presets
examples
server
#18600 opened 2026-01-04 20:29 by
ssam18
ggml: fix assertion in ggml_build_backward_expand for inplace operations
ggml
#18589 opened 2026-01-04 09:37 by
nlasky2000-dot
mtmd : fix integer overflow when n_tokens equals INT32_MIN
examples
#18588 opened 2026-01-04 09:21 by
ylwango613
ggml-backend: allow free = 0 and total = 0 to use host memory info
ggml
OpenCL
#18587 opened 2026-01-04 08:53 by
taronaeo
Fix division by zero vulnerability in gguf_init_from_file_impl
ggml
#18586 opened 2026-01-04 08:34 by
ylwango613
gguf-hash: add RVV tensor hashing using xxh3
examples
#18576 opened 2026-01-04 02:11 by
ixgbe
GGML RPC - Add support for Unix Domain Sockets
examples
ggml
#18574 opened 2026-01-03 22:44 by
struct
add option --tensor-type-file to llama-quantize
examples
#18572 opened 2026-01-03 21:01 by
EugeoSynthesisThirtyTwo
llama: max ctx by default, fix fit magic number
testing
examples
#18567 opened 2026-01-03 17:37 by
JohannesGaessler
cuda : check src shapes for CUDA graphs
Nvidia GPU
ggml
#18561 opened 2026-01-03 10:12 by
ggerganov
llama-bench: Add --no-fail (-nf) option
examples
#18554 opened 2026-01-02 21:03 by
SamAcctX
server : add thinking content blocks to Anthropic Messages API
examples
python
server
#18551 opened 2026-01-02 19:29 by
686f6c61
ggml : add ggml_build_forward_select
model
Nvidia GPU
Vulkan
ggml
SYCL
Apple Metal
Ascend NPU
OpenCL
IBM zDNN
#18550 opened 2026-01-02 17:35 by
ggerganov
graph : constant topology for tokens/embeddings inputs
model
#18549 opened 2026-01-02 13:50 by
ggerganov
context : reserve new scheduler when graph topology changes
#18547 opened 2026-01-02 13:31 by
ggerganov
Add EXAONE MoE implementations
model
examples
python
server
#18543 opened 2026-01-02 09:36 by
nuxlear
chat : add parsing for solar-open-100b
testing
#18540 opened 2026-01-02 08:20 by
aldehir
CUDA: cache intermediate tensors
Nvidia GPU
ggml
#18538 opened 2026-01-02 07:19 by
am17an
Older