Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
ggml-org/llama.cpp
Pull Requests
Commits
Open
Closed
HIP: use v_dot2_f32_f16 instruction for FA
Nvidia GPU
ggml
#15884 opened 2025-09-08 23:07 by
JohannesGaessler
CUDA: fix GET_ROWS for large tensors
Nvidia GPU
ggml
#15882 opened 2025-09-08 20:58 by
JohannesGaessler
contrib : add notes about merging PRs
#15881 opened 2025-09-08 18:37 by
ggerganov
feat: Extra debugging support for model conversion
examples
python
#15877 opened 2025-09-08 11:22 by
pwilkin
CUDA: Add `fastdiv` and `fastmodulo` to `k_bin_bcast*`, giving 1-3% E2E performance
Nvidia GPU
ggml
#15872 opened 2025-09-08 08:43 by
ORippler
model-conversion : add embedding prompt file support
examples
python
#15871 opened 2025-09-08 08:14 by
danbev
opencl: support ne3 in `get_rows`
ggml
OpenCL
#15866 opened 2025-09-08 06:02 by
lhez
Fix loongarch lsx compilation error
ggml
#15864 opened 2025-09-08 03:01 by
junchao-loongson
CANN: Format ggml-cann src code using clang-format
ggml
Ascend NPU
#15863 opened 2025-09-08 02:50 by
noemotiovon
vulkan: fix failing dequant shaders
Vulkan
ggml
#15862 opened 2025-09-08 02:36 by
jeffbolznv
vulkan: Fix OOB accesses in soft_max_back
testing
Vulkan
ggml
#15861 opened 2025-09-07 23:32 by
jeffbolznv
llama: print memory breakdown on exit
#15860 opened 2025-09-07 21:35 by
JohannesGaessler
CUDA: print CUDART_VERSION on init
Nvidia GPU
ggml
#15853 opened 2025-09-07 08:29 by
JohannesGaessler
Feat: Apertus model implementation
python
ggml
#15852 opened 2025-09-07 08:14 by
pwilkin
Modernize use nullptr
android
examples
#15851 opened 2025-09-07 08:01 by
distlibs
ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free
documentation
ggml
IBM zDNN
#15839 opened 2025-09-06 14:15 by
taronaeo
llguidance : use attrs to determine special tokens
#15837 opened 2025-09-06 13:34 by
dstoc
metal : make the backend async
ggml
Apple Metal
#15832 opened 2025-09-06 09:28 by
ggerganov
requirements : update transformers/torch for Embedding Gemma
testing
examples
python
server
#15828 opened 2025-09-06 05:04 by
danbev
webgpu : fix build on emscripten
build
script
testing
ggml
#15826 opened 2025-09-05 23:22 by
ngxson
Add support for Qwen3-Reranker
examples
python
server
#15824 opened 2025-09-05 21:21 by
iamlemec
Improve user-facing warnings for chat template and context length
examples
#15822 opened 2025-09-05 19:37 by
vinkal-chudgar
webui: fix Seed-OSS thinking block
examples
server
#15820 opened 2025-09-05 16:51 by
ServeurpersoCom
Rewrite llama-run to use llama-server
examples
#15818 opened 2025-09-05 14:06 by
ericcurtin
ggml : split graph allocations according to backend max buffer size
testing
Vulkan
ggml
#15815 opened 2025-09-05 09:42 by
Acly
CANN: implement LRU cache for ACL graphs in CANN backend
documentation
ggml
Ascend NPU
#15814 opened 2025-09-05 09:19 by
noemotiovon
CUDA: Conv2d Tensor Core
Nvidia GPU
ggml
#15813 opened 2025-09-05 06:36 by
mnehete32
Add conv2d Implicit GEMM
testing
Nvidia GPU
ggml
#15805 opened 2025-09-04 20:16 by
bssrdf
vulkan: add mul_mat variant for embedded gpus
Vulkan
ggml
#15800 opened 2025-09-04 16:25 by
rmatif
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type
Nvidia GPU
Vulkan
examples
ggml
#15797 opened 2025-09-04 15:14 by
slaren
Older