sync : ggml #3636

ggerganov merged 70 commits into master from sync-ggml-26-01-30
ggerganov
zhang-hui-yulo HIP: add fattn-mma-f16 for RDNA4 (llama/18481)
e94d11aa
DaAwesomeP ggml-metal: do not copy headers for embedded, use current binary dir …
241d378e
0cc4m vulkan: work around Intel fp16 bug in mmq (llama/18814)
d1549657
danbev CUDA : fix typo in clang pragma comment [no ci] (llama/18830)
848a34ae
jeffbolznv vulkan: Check maxStorageBufferRange in supports_op (llama/18709)
a4c04139
ORippler CUDA: Factor out and re-use `block_reduce` function (llama/18785)
ed8e1b9c
max-krasnyansky hexagon: support for OP_CPY, host buffers now optional (llama/18822)
a49e6887
shalinib-ibm ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (llama/18837)
b264175e
JohannesGaessler CUDA: fix allignment on register spill for FA (llama/18815)
0ed573e2
ggerganov cuda : print less debug logs when disabling cuda graphs (llama/18868)
da3b1b73
shaofeiqi OpenCL: add SOLVE_TRI op support (llama/18846)
3aa5e7ec
hipudding CANN: support gated linear attn (llama/18653)
3355e081
noemotiovon CANN: fix an issue where get_env was not fully renamed (llama/18796)
b59fcb02
rauletorresc CANN: Remove unused `ggml_cann_get_device` function (llama/18625)
60497ff4
DaAwesomeP ggml-blas: hide warnings from included BLAS headers (llama/18818)
a5fab81a
ThoreKoritzius ggml : extend ggml_pool_1d + metal (llama/16429)
cdf61be2
reeselevine ggml webgpu: support for backend sampling (llama/18880)
391bdb90
lhez opencl: fix q6_K mv for m=1 (llama/18893)
758c3c44
ggerganov ggml : add ggml_build_forward_select (llama/18550)
867ca7da
ggerganov metal : enable FA for MLA heads (llama/18950)
a987d92e
angt ggml : cleanup path_str() (llama/18928)
6d6f9866
ORippler CUDA: Replace init_offsets kernel with iterators in cub-based argsort…
cb6577c4
ORippler CUDA: Fix builds for older CCCL versions by ifdefing strided_iterator…
afe234b9
jeffbolznv vulkan: Use mul_mat_vec_id for small values of n (llama/18918)
85e51c0b
rillomas Revert "vulkan: force full subgroups for flash attention to fix intel…
cf88fb0a
jeffbolznv vulkan: support flash attention GQA/split_k with small batches (llama…
f1c37d25
jeffbolznv vulkan: Remove transfer_ctx, do everything in compute_ctx. (llama/18945)
28016940
AlekseiNikiforovIBM ggml-zdnn : mark zDNN buffers as non-host (llama/18967)
6863e60a
shaofeiqi opencl: add TRI op support (llama/18979)
3e1b8360
am17an CUDA: add gqa_ratio 4 for GLM 4.7 flash (llama/18953)
c511b403
lhez opencl: enable the general fp mm for non-cont input and as a fallback…
fffe9d00
JohannesGaessler CUDA: fix alignment check for FA (llama/19023)
e2d86f7b
ggerganov mla : make the V tensor a view of K (llama/18986)
ff0f90f9
Alcpz ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementat…
1708c1e1
arthw use malloc to support both iGPU and dGPU in same time (llama/18992)
29ba416e
chraac ggml-hexagon: flash-attn opt (llama/19025)
af0790bf
am17an ggml-cuda: enable cuda-graphs for `n-cpu-moe` (llama/18934)
4abd7e2b
JohannesGaessler CUDA: re-use MLA K data for V in MMA FA (llama/19057)
b52f20c2
ggerganov kv-cache : support V-less cache (llama/19067)
83a0a997
am17an ggml-cpu: Use tiled FA for prompt-processing (llama/19012)
72d9d3e9
ccbinn metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/m…
024f396d
JohannesGaessler CUDA: faster FA for GQA > 1 but not power of 2 (llama/19092)
cde57edc
JohannesGaessler CUDA: fix padding of GQA to power of 2 in FA (llama/19115)
247e5234
lhez opencl: add flattened q6_K mv (llama/19054)
8e3b59de
shalinib-ibm ggml-cpu: Enable FP16 MMA kernels on PPC (llama/19060)
861dc273
gaugarg-nv Reduce CPU-side stalls due to the CUDA command buffer being full (lla…
a415aa4d
Alcpz ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementat…
91777092
JohannesGaessler CUDA: tune GLM 4.7 Flash FA kernel selection logic (llama/19097)
5cc23d5b
z-vishal ggml-zendnn : update ZenDNN git tag to main branch (llama/19133)
25fbee9c
nikhilJain17 ggml webgpu: Split shared state (webgpu_context) into global state an…
efc07fbe
ggerganov CUDA: tune GLM 4.7 Flash FA kernel selection logic (DGX Spark) (llama…
4a227b50
ggerganov cuda : fix "V is K view" check for non-unified KV cache (llama/19145)
e80c5899
Alcpz ggml-cpu: arm64: Q4_K scale unroll and vectorization (llama/19108)
6551c7a0
kpouget ggml: new backend for Virglrenderer API Remoting acceleration (v2) (l…
ca91a430
okuvshynov vulkan: handle device dedup on MacOS + Vega II Duo cards (llama/19058)
389a7e22
PatKamin ggml-sycl: remove unused syclcompat header (llama/19140)
2e291680
0cc4m Vulkan Flash Attention Coopmat1 Refactor (llama/19075)
a3958464
arthw sycl: fix norm kernels: l2_norm, group_norm, rms_norm by remove asser…
0a7ea4b3
am17an CUDA: refactor topk-moe to enable more models (GLM 4.7, Nemotron etc.…
71627ccd
z-vishal ggml-zendnn : resolve ZenDNN backend cross-module symbol dependency (…
7bed9d18
zhang-hui-yulo HIP: add mmf for CDNA (llama/18896)
e5891014
ggerganov cuda : fix nkvo, offload and cuda graph node properties matching (lla…
38bb4d64
tboinovski1 hexagon: enable offloading to Hexagon on Windows on Snapdragon (llama…
99076001
ArberSephirotheca ggml-webgpu: improve flastAttention performance by software pipelinin…
6cf5828d
RachelMantel sycl: implement GGML_OP_TRI (llama/19089)
2466cf11
s8322 sycl: implement GGML_UNARY_OP_SOFTPLUS (llama/19114)
c5003e9b
bssrdf add tensor type checking as part of cuda graph properties (llama/19186)
33b2ffca
ggerganov sync : ggml
859f583d
ggerganov talk-llama : sync llama.cpp
54d3fbc1
danbev
danbev approved these changes on 2026-01-30
ggerganov cuda : fix compile warnings (#0)
ced24648
ggerganov ggerganov merged acbace05 into master 3 days ago
ggerganov ggerganov deleted the sync-ggml-26-01-30 branch 3 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone