HIP: add fattn-mma-f16 for RDNA4 (llama/18481)
e94d11aa
ggml-metal: do not copy headers for embedded, use current binary dir …
241d378e
vulkan: work around Intel fp16 bug in mmq (llama/18814)
d1549657
CUDA : fix typo in clang pragma comment [no ci] (llama/18830)
848a34ae
vulkan: Check maxStorageBufferRange in supports_op (llama/18709)
a4c04139
CUDA: Factor out and re-use `block_reduce` function (llama/18785)
ed8e1b9c
hexagon: support for OP_CPY, host buffers now optional (llama/18822)
a49e6887
ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (llama/18837)
b264175e
CUDA: fix allignment on register spill for FA (llama/18815)
0ed573e2
cuda : print less debug logs when disabling cuda graphs (llama/18868)
da3b1b73
OpenCL: add SOLVE_TRI op support (llama/18846)
3aa5e7ec
CANN: support gated linear attn (llama/18653)
3355e081
CANN: fix an issue where get_env was not fully renamed (llama/18796)
b59fcb02
CANN: Remove unused `ggml_cann_get_device` function (llama/18625)
60497ff4
ggml-blas: hide warnings from included BLAS headers (llama/18818)
a5fab81a
ggml : extend ggml_pool_1d + metal (llama/16429)
cdf61be2
ggml webgpu: support for backend sampling (llama/18880)
391bdb90
opencl: fix q6_K mv for m=1 (llama/18893)
758c3c44
ggml : add ggml_build_forward_select (llama/18550)
867ca7da
metal : enable FA for MLA heads (llama/18950)
a987d92e
ggml : cleanup path_str() (llama/18928)
6d6f9866
CUDA: Replace init_offsets kernel with iterators in cub-based argsort…
cb6577c4
CUDA: Fix builds for older CCCL versions by ifdefing strided_iterator…
afe234b9
vulkan: Use mul_mat_vec_id for small values of n (llama/18918)
85e51c0b
Revert "vulkan: force full subgroups for flash attention to fix intel…
cf88fb0a
vulkan: support flash attention GQA/split_k with small batches (llama…
f1c37d25
vulkan: Remove transfer_ctx, do everything in compute_ctx. (llama/18945)
28016940
ggml-zdnn : mark zDNN buffers as non-host (llama/18967)
6863e60a
opencl: add TRI op support (llama/18979)
3e1b8360
CUDA: add gqa_ratio 4 for GLM 4.7 flash (llama/18953)
c511b403
opencl: enable the general fp mm for non-cont input and as a fallback…
fffe9d00
CUDA: fix alignment check for FA (llama/19023)
e2d86f7b
mla : make the V tensor a view of K (llama/18986)
ff0f90f9
ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementat…
1708c1e1
use malloc to support both iGPU and dGPU in same time (llama/18992)
29ba416e
ggml-hexagon: flash-attn opt (llama/19025)
af0790bf
ggml-cuda: enable cuda-graphs for `n-cpu-moe` (llama/18934)
4abd7e2b
CUDA: re-use MLA K data for V in MMA FA (llama/19057)
b52f20c2
kv-cache : support V-less cache (llama/19067)
83a0a997
ggml-cpu: Use tiled FA for prompt-processing (llama/19012)
72d9d3e9
metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/m…
024f396d
CUDA: faster FA for GQA > 1 but not power of 2 (llama/19092)
cde57edc
CUDA: fix padding of GQA to power of 2 in FA (llama/19115)
247e5234
opencl: add flattened q6_K mv (llama/19054)
8e3b59de
ggml-cpu: Enable FP16 MMA kernels on PPC (llama/19060)
861dc273
Reduce CPU-side stalls due to the CUDA command buffer being full (lla…
a415aa4d
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementat…
91777092
CUDA: tune GLM 4.7 Flash FA kernel selection logic (llama/19097)
5cc23d5b
ggml-zendnn : update ZenDNN git tag to main branch (llama/19133)
25fbee9c
ggml webgpu: Split shared state (webgpu_context) into global state an…
efc07fbe
CUDA: tune GLM 4.7 Flash FA kernel selection logic (DGX Spark) (llama…
4a227b50
cuda : fix "V is K view" check for non-unified KV cache (llama/19145)
e80c5899
ggml-cpu: arm64: Q4_K scale unroll and vectorization (llama/19108)
6551c7a0
ggml: new backend for Virglrenderer API Remoting acceleration (v2) (l…
ca91a430
vulkan: handle device dedup on MacOS + Vega II Duo cards (llama/19058)
389a7e22
ggml-sycl: remove unused syclcompat header (llama/19140)
2e291680
Vulkan Flash Attention Coopmat1 Refactor (llama/19075)
a3958464
sycl: fix norm kernels: l2_norm, group_norm, rms_norm by remove asser…
0a7ea4b3
CUDA: refactor topk-moe to enable more models (GLM 4.7, Nemotron etc.…
71627ccd
ggml-zendnn : resolve ZenDNN backend cross-module symbol dependency (…
7bed9d18
HIP: add mmf for CDNA (llama/18896)
e5891014
cuda : fix nkvo, offload and cuda graph node properties matching (lla…
38bb4d64
hexagon: enable offloading to Hexagon on Windows on Snapdragon (llama…
99076001
ggml-webgpu: improve flastAttention performance by software pipelinin…
6cf5828d
sycl: implement GGML_OP_TRI (llama/19089)
2466cf11
sycl: implement GGML_UNARY_OP_SOFTPLUS (llama/19114)
c5003e9b
add tensor type checking as part of cuda graph properties (llama/19186)
33b2ffca
sync : ggml
859f583d
talk-llama : sync llama.cpp
54d3fbc1
danbev
approved these changes
on 2026-01-30
cuda : fix compile warnings (#0)
ced24648
ggerganov
merged
acbace05
into master 3 days ago
ggerganov
deleted the sync-ggml-26-01-30 branch 3 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub