PR #3606 sync : ggml - SemanticDiff

vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron …

b1777fc3

ggml-cuda: remove unneccesary prints on ggml_cuda_init (llama/18502)

e4b32ccd

cuda : fix copy of large tensors (ggml_nbytes <= INT_MAX assertion) (…

9de6e142

rpc : use unordered_map::reserve and emplace (llama/18513)

e70bc419

metal : adjust extra size for FA buffer to avoid reallocations (llama…

868b0fdf

vulkan: Implement mmvq for iq1_s/iq1_m (llama/18450)

63266b0f

vulkan: Optimize GGML_OP_CUMSUM (llama/18417)

9b5c0cc4

ggml-hexagon: optimize activation function (llama/18393)

1059e04f

(Bugfix, ggml-cuda) Pool alloc count fix + small size computation typ…

4face37a

CUDA: only allocate FA tmp buffer if needed (llama/18564)

77a71748

ggml-cuda: fixes for concurrent streams (llama/18496)

e1f2d207

ggml-cuda: remove unused params in ggml_cuda_graph (llama/18579)

42102e2d

CUDA: disable cuda graph when using n-cpu-moe (llama/18593)

5f0ef6a8

sampling : add support for backend sampling (llama/17004)

6534b0e8

CANN: add operator fusion support for ADD + RMS_NORM (llama/17512)

36d41d86

vulkan: handle quantize_q8_1 overflowing the max workgroup count (lla…

8fdefeb7

vulkan: fix topk_moe_sigmoid_norm_bias failures in GLM-4.6 (llama/18582)

f58ebe6c

ggml-cuda: check for srcs outside the cgraph (llama/18583)

f1cc70fa

CUDA: fix FA FP16 accumulator overflow for Granite (llama/18614)

30f7deb4

ggml webgpu: add CEIL operation support (llama/18605)

928e20c8

CANN: Make `valid_values` variable `static const` (llama/18627)

05ebbf8f

ggml : fix avx512bf16 build (llama/18623)

8470e13e

mmq.cu: tune mmq/rocblas switching for RDNA (llama/18537)

4d6a7ee1

ggml-cuda: refactor cuda graph usage (llama/18637)

3963f7c5

vulkan: support buffer_from_host_ptr (llama/18467)

954f87a3

ggml : optimize cuda ssm_scan using warp-level reduction (llama/18505)

718693f2

Hexagon add support for f16/f32 flash attention, scale, set-rows and …

b67a0fc7

CANN: Rename `get_env` to `get_env_as_lowercase` (llama/18624)

0ed0e41b

CANN: Fix rename for get_env (llama/18652)

9c5a3dc5

vulkan: more mul mat optimizations (llama/18533)

92948849

vulkan: Warptile tuning for Intel Xe2/Xe3 (llama/18178)

517629e9

vulkan: reject ops when a tensor is too large to allocate (llama/18646)

f8f39015

cuda : fix build on cuda 12.8 (llama/18672)

b0a98658

opencl: add FILL op support (llama/18682)

f6c1a9dc

ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (llama/18535)

cc6d375a

metal : add MoE kernel specialization for ne20=5 (llama/18667)

d5fe8dda

vulkan: optimize ssm_scan (llama/18630)

cea84fa5

vulkan: fix push constant size for quantize_q8_1 (llama/18687)

ca716811

ggml webgpu: initial flashattention implementation (llama/18610)

d4c7a35e

ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten. (llama/18628)

a0670c78

llama: use host memory if device reports 0 memory (llama/18587)

c48c9664

Updates to webgpu get_memory (llama/18707)

55fd887c

opencl: add EXPM1 op (llama/18704)

f4ac80a8

Corrected: changed s13 = src1->nb[3] instead of nb[2] (llama/18724)

d2aceabb

cmake : update blas logic (llama/18205)

a17a360c

HIP: adjust RDNA3.5 MMQ kernel selction logic (llama/18666)

b70ed226

opencl: add SOFTPLUS op support (llama/18726)

818127bc

sync : ggml

500facbc

talk-llama : sync llama.cpp

7c4c31cf

danbev approved these changes on 2026-01-12

Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support …

b56b1f5a

vulkan: Disable large coopmat matmul configuration on proprietary AMD…

a3b19b27

vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id)…

7ab0bcb7

vulkan: change memory_logger to be controlled by an env var (llama/18…

a63ba570

CUDA : fix unused argument when USE_CUDA_GRAPH=OFF (llama/18800)

784a46f3

sync : ggml

4230998e

ggerganov merged 47af2fb7 into master 50 days ago

ggerganov deleted the sync-ggml-26-01-14 branch 50 days ago

whisper.cpp
sync : ggml
#3606

Merged

sync : ggml #3606

whisper.cpp sync : ggml #3606 Merged

sync : ggml #3606

whisper.cpp
sync : ggml
#3606

Merged