PR #3478 sync : ggml - SemanticDiff

sync : ggml #3478

ggerganov merged 23 commits into master from sync-ggml-25-10-14

CUDA: faster tile FA, add oob checks, more HSs (llama/16492)

791e60a6

ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (llama/16…

7847625f

ggml : Fix FP16 ELU positive branch (llama/16519)

45e26a5f

fix UT fault cases: count-equal, argsort, pad OPs (llama/16521)

99d07411

metal : add opt_step_adamw and op_sum (llama/16529)

1a08f1d9

CANN: Update several operators to support FP16 data format (llama/16251)

f9de4e2a

ggml : fix scalar path for computing norm (llama/16558)

6dd86087

metal: add support for opt_step_sgd (llama/16539)

5ef11175

CANN: fix CPU memory leak in CANN backend (llama/16549)

b7c7d0c7

ggml : fix build broken with -march=armv9-a on MacOS (llama/16520)

7ce6c536

CUDA: fix numerical issues in tile FA kernel (llama/16540)

e2b9c209

opencl: fix build targeting CL 2 (llama/16554)

6839554b

metal : FA support F32 K and V and head size = 32 (llama/16531)

d98a1645

cuda : remove legacy copy-op pointer indirection code (llama/16485)

d541d242

CUDA: add fp kernel for larger batch size MoE (llama/16512)

17d67cab

CUDA: use fastdiv + ggml_cuda_mad for mmvf (llama/16557)

360acc78

CUDA: enable FA for FP32 KV cache (llama/16546)

395008b4

vulkan: Improve build time for MSVC (llama/16545)

0f82a3c5

vulkan: Support FA with K/V in F32 (llama/16543)

c03c4348

CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion …

296f8c0b

vulkan: Add ACC_TYPE_VEC2 implementation (llama/16203)

84521445

sync : ggml

c5d5a808

talk-llama : sync llama.cpp

2eb25b13

danbev approved these changes on 2025-10-15

ggerganov merged 8ba3c13b into master 156 days ago

ggerganov deleted the sync-ggml-25-10-14 branch 156 days ago

Reviewers

danbev

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

whisper.cpp sync : ggml #3478 Merged

sync : ggml #3478

whisper.cpp
sync : ggml
#3478

Merged