sync : ggml #3606

ggerganov merged 55 commits into master from sync-ggml-26-01-14
ggerganov
jeffbolznv vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron …
b1777fc3
am17an ggml-cuda: remove unneccesary prints on ggml_cuda_init (llama/18502)
e4b32ccd
Meet91721 cuda : fix copy of large tensors (ggml_nbytes <= INT_MAX assertion) (…
9de6e142
struct rpc : use unordered_map::reserve and emplace (llama/18513)
e70bc419
ggerganov metal : adjust extra size for FA buffer to avoid reallocations (llama…
868b0fdf
jeffbolznv vulkan: Implement mmvq for iq1_s/iq1_m (llama/18450)
63266b0f
jeffbolznv vulkan: Optimize GGML_OP_CUMSUM (llama/18417)
9b5c0cc4
joeldushouyu ggml-hexagon: optimize activation function (llama/18393)
1059e04f
pl752 (Bugfix, ggml-cuda) Pool alloc count fix + small size computation typ…
4face37a
JohannesGaessler CUDA: only allocate FA tmp buffer if needed (llama/18564)
77a71748
am17an ggml-cuda: fixes for concurrent streams (llama/18496)
e1f2d207
am17an ggml-cuda: remove unused params in ggml_cuda_graph (llama/18579)
42102e2d
am17an CUDA: disable cuda graph when using n-cpu-moe (llama/18593)
5f0ef6a8
danbev sampling : add support for backend sampling (llama/17004)
6534b0e8
noemotiovon CANN: add operator fusion support for ADD + RMS_NORM (llama/17512)
36d41d86
jeffbolznv vulkan: handle quantize_q8_1 overflowing the max workgroup count (lla…
8fdefeb7
jeffbolznv vulkan: fix topk_moe_sigmoid_norm_bias failures in GLM-4.6 (llama/18582)
f58ebe6c
am17an ggml-cuda: check for srcs outside the cgraph (llama/18583)
f1cc70fa
JohannesGaessler CUDA: fix FA FP16 accumulator overflow for Granite (llama/18614)
30f7deb4
tnguyen21 ggml webgpu: add CEIL operation support (llama/18605)
928e20c8
rauletorresc CANN: Make `valid_values` variable `static const` (llama/18627)
05ebbf8f
angt ggml : fix avx512bf16 build (llama/18623)
8470e13e
Beinsezii mmq.cu: tune mmq/rocblas switching for RDNA (llama/18537)
4d6a7ee1
am17an ggml-cuda: refactor cuda graph usage (llama/18637)
3963f7c5
jeffbolznv vulkan: support buffer_from_host_ptr (llama/18467)
954f87a3
Aadeshveer ggml : optimize cuda ssm_scan using warp-level reduction (llama/18505)
718693f2
max-krasnyansky Hexagon add support for f16/f32 flash attention, scale, set-rows and …
b67a0fc7
rauletorresc CANN: Rename `get_env` to `get_env_as_lowercase` (llama/18624)
0ed0e41b
hipudding CANN: Fix rename for get_env (llama/18652)
9c5a3dc5
netrunnereve vulkan: more mul mat optimizations (llama/18533)
92948849
virajwad vulkan: Warptile tuning for Intel Xe2/Xe3 (llama/18178)
517629e9
jeffbolznv vulkan: reject ops when a tensor is too large to allocate (llama/18646)
f8f39015
olliewalsh cuda : fix build on cuda 12.8 (llama/18672)
b0a98658
shaofeiqi opencl: add FILL op support (llama/18682)
f6c1a9dc
DocShotgun ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (llama/18535)
cc6d375a
dororodoroddo metal : add MoE kernel specialization for ne20=5 (llama/18667)
d5fe8dda
jeffbolznv vulkan: optimize ssm_scan (llama/18630)
cea84fa5
jeffbolznv vulkan: fix push constant size for quantize_q8_1 (llama/18687)
ca716811
reeselevine ggml webgpu: initial flashattention implementation (llama/18610)
d4c7a35e
yomaytk ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten. (llama/18628)
a0670c78
taronaeo llama: use host memory if device reports 0 memory (llama/18587)
c48c9664
reeselevine Updates to webgpu get_memory (llama/18707)
55fd887c
shaofeiqi opencl: add EXPM1 op (llama/18704)
f4ac80a8
michaelw9999 Corrected: changed s13 = src1->nb[3] instead of nb[2] (llama/18724)
d2aceabb
DaAwesomeP cmake : update blas logic (llama/18205)
a17a360c
JohannesGaessler HIP: adjust RDNA3.5 MMQ kernel selction logic (llama/18666)
b70ed226
shaofeiqi opencl: add SOFTPLUS op support (llama/18726)
818127bc
ggerganov sync : ggml
500facbc
ggerganov talk-llama : sync llama.cpp
7c4c31cf
danbev
danbev approved these changes on 2026-01-12
0cc4m Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support …
b56b1f5a
0cc4m vulkan: Disable large coopmat matmul configuration on proprietary AMD…
a3b19b27
jeffbolznv vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id)…
7ab0bcb7
jeffbolznv vulkan: change memory_logger to be controlled by an env var (llama/18…
a63ba570
ggerganov CUDA : fix unused argument when USE_CUDA_GRAPH=OFF (llama/18800)
784a46f3
ggerganov sync : ggml
4230998e
ggerganov ggerganov merged 47af2fb7 into master 50 days ago
ggerganov ggerganov deleted the sync-ggml-26-01-14 branch 50 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone