Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggml-org/llama.cpp
Pull Requests
Commits
Open
Closed
cuda : fix supports_op condition for get_rows when src1->ne2 > 1
Nvidia GPU
ggml
#15868 opened 2025-09-08 07:15 by
ggerganov
convert : force setting sliding_window from original config
python
#15867 opened 2025-09-08 06:54 by
danbev
opencl: support ne3 in `get_rows`
ggml
OpenCL
#15866 opened 2025-09-08 06:02 by
lhez
Fix loongarch lsx compilation error
ggml
#15864 opened 2025-09-08 03:01 by
junchao-loongson
CANN: Format ggml-cann src code using clang-format
ggml
Ascend NPU
#15863 opened 2025-09-08 02:50 by
noemotiovon
vulkan: fix failing dequant shaders
Vulkan
ggml
#15862 opened 2025-09-08 02:36 by
jeffbolznv
vulkan: Fix OOB accesses in soft_max_back
testing
Vulkan
ggml
#15861 opened 2025-09-07 23:32 by
jeffbolznv
llama: print memory breakdown on exit
#15860 opened 2025-09-07 21:35 by
JohannesGaessler
metal : refactor + optimize
testing
ggml
Apple Metal
#15857 opened 2025-09-07 16:34 by
ggerganov
CUDA: print CUDART_VERSION on init
Nvidia GPU
ggml
#15853 opened 2025-09-07 08:29 by
JohannesGaessler
Feat: Apertus model implementation
python
ggml
#15852 opened 2025-09-07 08:14 by
pwilkin
Modernize use nullptr
android
examples
#15851 opened 2025-09-07 08:01 by
distlibs
vulkan: sort graph to allow more parallel execution
Nvidia GPU
Vulkan
ggml
SYCL
Apple Metal
Ascend NPU
OpenCL
IBM zDNN
#15850 opened 2025-09-07 04:20 by
jeffbolznv
ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free
documentation
ggml
IBM zDNN
#15839 opened 2025-09-06 14:15 by
taronaeo
llguidance : use attrs to determine special tokens
#15837 opened 2025-09-06 13:34 by
dstoc
metal : make the backend async
ggml
Apple Metal
#15832 opened 2025-09-06 09:28 by
ggerganov
json : support `enum` values within `allOf`
testing
examples
python
server
#15830 opened 2025-09-06 07:27 by
aldehir
requirements : update transformers/torch for Embedding Gemma
testing
examples
python
server
#15828 opened 2025-09-06 05:04 by
danbev
webgpu : fix build on emscripten
build
script
testing
ggml
#15826 opened 2025-09-05 23:22 by
ngxson
Add support for Qwen3-Reranker
examples
python
server
#15824 opened 2025-09-05 21:21 by
iamlemec
Improve user-facing warnings for chat template and context length
examples
#15822 opened 2025-09-05 19:37 by
vinkal-chudgar
webui: fix Seed-OSS thinking block
examples
server
#15820 opened 2025-09-05 16:51 by
ServeurpersoCom
Rewrite llama-run to use llama-server
examples
#15818 opened 2025-09-05 14:06 by
ericcurtin
ggml : split graph allocations according to backend max buffer size
testing
Vulkan
ggml
#15815 opened 2025-09-05 09:42 by
Acly
CANN: implement LRU cache for ACL graphs in CANN backend
documentation
ggml
Ascend NPU
#15814 opened 2025-09-05 09:19 by
noemotiovon
CUDA: Conv2d Tensor Core
Nvidia GPU
ggml
#15813 opened 2025-09-05 06:36 by
mnehete32
Add conv2d Implicit GEMM
testing
Nvidia GPU
ggml
#15805 opened 2025-09-04 20:16 by
bssrdf
vulkan: add mul_mat variant for embedded gpus
Vulkan
ggml
#15800 opened 2025-09-04 16:25 by
rmatif
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type
Nvidia GPU
Vulkan
examples
ggml
#15797 opened 2025-09-04 15:14 by
slaren
Add docker:// protocol support for llama-server model pulling
#15790 opened 2025-09-04 11:38 by
ericcurtin
Older