PR #3148 sync : ggml - SemanticDiff

sync : ggml #3148

ggerganov merged 21 commits into master from sync-ggml-25-05-13

sycl: addressing non-contiguous src1 mul_mats (nc and batched) (llama…

0c4a2291

vulkan: Allow up to 4096 elements for mul_mat_id row_ids (llama/13326)

19d8d9a9

rpc : add rpc_msg_set_tensor_hash_req (llama/13353)

00c80567

CUDA: fix crash on large batch size for MoE models (llama/13384)

f8c75dc4

CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)

aef59f48

sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (llama/12…

b493e03b

vulkan: scalar flash attention implementation (llama/13324)

22f4997d

CUDA: fix FlashAttention on Turing (llama/13415)

04445664

CUDA: fix race conditions FlashAttention kernels (llama/13438)

86dece9c

Add `--no-op-offload` to improve `-ot` pp perf in MoE models like lla…

0b1962a1

CUDA: fix crash with partial offloading of MoE (llama/13439)

c4268297

enable dpcpp nightly builds with libraries (llama/13406)

882d9757

CUDA: fix misaligned synchronization in FA (llama/13469)

8264872b

ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (llama/13053)

cb90cb09

llama/ggml: add LLM training support (llama/10544)

fe0d52b9

opencl: remove unnecessary assert for `add` (llama/13257)

43a59ecc

metal : optimize MoE for large batches (llama/13388)

926e06db

ggml : add mrope kernel for metal (llama/13457)

79fb43e2

sync : ggml

89970b9a

whisper : update to ggml-backend changes (#0)

69753804

talk-llama : sync llama.cpp

bff8dc24

danbev approved these changes on 2025-05-13

ggerganov merged f8905605 into master 259 days ago

Reviewers

danbev

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

whisper.cpp sync : ggml #3148 Merged

sync : ggml #3148

whisper.cpp
sync : ggml
#3148

Merged