PR #3710 sync : ggml - SemanticDiff

ggml-cpu: add repack for mxfp4 (llama/19738)

8f3aa2fd

CUDA: add CDNA3 MFMA support for flash attention MMA kernel (llama/19…

50646889

cuda: cap grid.y at 65535 in non-contiguous dequantize/convert kernel…

681debff

vulkan: improve partial offloading performance on AMD (llama/19976)

91e9af83

ggml-cpu: optimise s390x multiply extend instructions (llama/20032)

1810a80b

vulkan: tune MMVQ for Intel Windows (llama/19988)

47b12eae

ggml-webgpu: Support non-contiguous `src0` and overlapping `src0/src1…

84e45a60

ggml webgpu: Clean up per-thread parameter buffer pool and job submis…

8be81d37

ggml webgpu: fix workgroup dispatch limit for large batch sizes (llam…

a444c8a0

opencl: add optimized q4_1 mm kernel for adreno (llama/19840)

d89fc23b

kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64 (llama/…

2eeb5e3a

ggml : use a simple std::thread in AMX without OpenMP (llama/20074)

1a9b0f9f

ggml: fix ggml_is_contiguous_n for ne == 1 (llama/20092)

ffe593bb

Add concat op to webgpu. (llama/20068)

c456e26e

Fix wait logic for inflight jobs (llama/20096)

6ae853c3

opencl: add `SET`, support i32 for `CPY`, minor refactor for cpy (lla…

38cc52c1

hexagon: Flash Attention optimizations (dma, mpyacc, multi-row) and M…

0964d663

chore : correct typos [no ci] (llama/20041)

484163ec

CUDA: Improve performance via less synchronizations between token (ll…

22502c04

hexagon: add fp16 support for binary ops: add,sub,mul,div (llama/20139)

15ff3f5b

opencl: add neg, exp and diag (llama/20127)

dc6b7229

ggml-cpu: fix data race for debug asserts (llama/20148)

a27e51cb

CUDA: use shared mem for ssm_conv (llama/20128)

cbd9e948

ggml-cpu: Fix gcc 15 ICE on ppc64le (ggml/20083) (llama/20130)

96fb6151

ggml: update comments for backends which have no memory to report (ll…

7f38f2a2

ggml-cuda: add mem check for fusion (llama/19916)

6e34b302

cpu: skip redudant ROPE cache updates (llama/20149)

9bfd03b9

hexagon: add f32 ssm_conv op (llama/20122)

d5ea0591

quants : Add memsets and other fixes for IQ quants (llama/19861)

94de6807

opencl: add l2_norm (llama/20160)

78531aa0

ggml: add GATED_DELTA_NET op (llama/19504)

bc20d1ab

supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (llama/20190)

b61009bb

vulkan: Fix data races in coopmat1 mul_mat(_id) (llama/20084)

df65a360

ggml-vulkan: Add ELU op support (llama/20183)

18a8e3d5

cuda : display total and free VRAM capacity during device initializat…

47ba98e1

vulkan: skip zero size tensors in backend copies (llama/20233)

e8501ded

ggml-vulkan: add SGN operator, auto-generate Vulkan.csv and ops.md (l…

f328ac96

ggml-cuda: disable gdn for musa (llama/20278)

e307d934

metal : add upscale (llama/20284)

dac5a06a

metal : extend mul_mv_ext to BF16, Q2_K, Q3_K (llama/20250)

6a1de6f8

metal: handle command buffer failures gracefully in synchronize (llam…

d2a96485

ggml-cpu: add RVV repack GEMM and GEMV for quantization types (llama/…

ef15437e

kleidiai : support for concurrent sme and neon kernel execution (llam…

e155c541

ggml webgpu: faster normal quant and some k-quant matrix operations, …

bf2d45a6

ggml : bump RPC version (llama/20330)

f1265f13

fix for failed UT case: ACC, L2_NORM, UPSCALE, fused_glu, unary (llam…

ec414e9d

fix op rope, add rope_back (llama/20293)

e3db9561

cuda/hip: fix loop unrolling in ssm-conv (llama/20369)

ff5fa922

ggml-cuda: gdn use shared mem for HIP (llama/20366)

42df97a1

metal : add env var to trigger graph capture (llama/20398)

80655c9a

metal : fix q5_k mul_mv register spill (llama/20399)

dafed294

metal : fix capture_compute counter logic (llama/20410)

ed7718a3

llama : add support for Nemotron 3 Super (llama/20411)

e19ec84c

ggml : add NVFP4 quantization type support (llama/19769)

e56a3c19

llama : enable chunked fused GDN path (llama/20340)

9f32749f

ggml-webgpu: Add supports for `GGML_OP_REPEAT` (llama/20230)

0ec906b9

hip: compile debug builds with -O2 on hip to avoid a compiler bug (ll…

20e71614

opencl: add cumsum op (llama/18981)

c8fc662f

opencl: use larger workgroup size for get_rows (llama/20316)

b14f9f58

vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large mode…

00917d94

vulkan: fix OOB check in flash_attn_mask_opt (llama/20296)

812c45ee

vulkan: fix l2_norm epsilon handling (llama/20350)

8ca1798d

sync : ggml

570e1466

metal : avoid divisions in bin kernel (llama/20426)

e18c6d28

sync : ggml

606ee599

vulkan: fix SSM_CONV PP scaling with large ubatch sizes (llama/20379)

e057ff00

vulkan: add GATED_DELTA_NET op support (llama/20334)

f25cbd60

llama : disable graph reuse with pipeline parallelism (llama/20463)

f5025aec

metal : fix l2 norm scale (llama/20493)

8a33e87a

ggml : fix typo gmml (llama/20512)

7f44d6b7

ggml-cpu: add RVV vec dot kernels for quantization types (llama/18859)

5022e7eb

graph : remove redundant GDN state transposes (llama/20443)

97854bea

opencl: fix l2_norm (llama/20480)

deaa2db4

Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT ope…

bdc9bac0

ggml : add OpenVINO backend (llama/15307)

7a14c004

Use fp32 in cuBLAS V100 to avoid overflows, env variables to override…

cfa02a01

ggml : add native AVX512-FP16 support for F16 operations (llama/20529)

5389551b

add op gated_delta_net (llama/20455)

b3106ec5

hexagon: Q4_0 and MXFP4 repack fixes (llama/20527)

21773e33

metal : add FA specialization for HSK = 320, HSV = 256 (llama/20549)

d752d737

vulkan: use graphics queue on AMD (llama/20551)

c43c45c5

cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode (llama…

f9cd8348

ggml : guard against sumq2 being 0 in IQ4_NL (llama/20460)

25a75674

ggml/hip: fix APU compatibility - soft error handling for hipMemAdvis…

603bc50d

ggml: avoid creating CUDA context during device init (llama/20595)

d79032c9

CUDA: limit number of FA stream-k CUDA blocks (llama/20586)

491d5129

common : add nvfp4 (ggml/0)

4ebc6f58

ggml : extend im2col f16 (ggml/1434)

83fed291

sync : ggml

2f41a39b

talk-llama : sync llama.cpp

678c2b42

danbev approved these changes on 2026-03-16

ggml : try fix arm build (#0)

ae853f46

ggerganov merged 27fa2077 into master 73 days ago

ggerganov deleted the sync-ggml-26-03-16 branch 73 days ago

whisper.cpp
sync : ggml
#3710

Merged

sync : ggml #3710

whisper.cpp sync : ggml #3710 Merged

sync : ggml #3710

whisper.cpp
sync : ggml
#3710

Merged