PR #3470 sync : ggml - SemanticDiff

sync : ggml #3470

ggerganov merged 36 commits into master from sync-ggml-25-11-12

opencl: support ne3 in get_rows (llama/15866)

69943f8b

ggml webgpu: support for rope,div,sub,glu,scale,cont operators (llama…

3a5a3546

opencl: support pad_ext (llama/15888)

a57c9f69

vulkan: make ggml_vk_default_dispatcher support older vulkan headers …

032abbcc

HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.…

4d08090e

musa: update compile flags (llama/16265)

3b2df32a

model : Apertus model implementation (llama/15852)

5cd34243

ggml webgpu: add support for soft_max, optimize rms_norm (llama/16357)

cc6dc14e

vulkan: in flash attention, bounds check against nem1 (don't rely on …

3f2ecffc

vulkan: Fix FA coopmat1 invalid array indexing (llama/16365)

fe538c22

vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (ll…

2e235913

ggml : fix graph reallocation with multiple chunks (llama/16396)

3d3000fb

metal : fix loop bound in ggml_mem_ranges (llama/16412)

5f895996

vulkan : incremental shader builds (llama/16341)

75159b53

rpc : add support for multiple devices (llama/16276)

0c56ec3e

rpc : check src buffer when copying tensor (llama/16421)

98b549d5

vulkan: use a more appropriate amount of threads when generating shad…

6e7e1b8d

ggml webgpu: actually add softmax, fix rms_norm offset (llama/16400)

72b9fa00

ggml-cpu : fix leftover handling in ggml_vec_scale_f32 for SVE (llama…

73265c03

ggml : fix unaligned access in AMX code (llama/16315)

352a07a2

metal : various optimizations + refactoring (llama/16446)

389681e7

tests : add -INF blocks to the KQ mask in the FA tests (llama/16380)

091a5c11

metal : add support for non-padded FA KV (llama/16148)

d75f9ae9

ggml webgpu: profiling, CI updates, reworking of command submission (…

c8d88fc2

metal : mark FA blocks (llama/16372)

1b7b1200

Disable CUDA host buffers on integrated GPUs (llama/16308)

57d8e6b1

refactor soft_max, add soft_max_back (llama/16472)

73b3339f

kleidiai: kernel interface refactoring (llama/16460)

ba2e955f

CANN: Improve ACL graph matching (llama/16166)

910395c5

cpu : optimize the ggml NORM operation (llama/15953)

779ca59c

cmake : Dont define XOPENSOURCE on AIX (llama/16481)

667e3645

cuda : avoid initializing unused devices (llama/16510)

d4775054

metal : fix mul-mm condition + fix mul-mv permuted kernels (llama/16494)

33f78624

sync : ggml

4f776684

talk-llama : sync llama.cpp

2ad7a695

bench : update [no ci]

55d8f017

danbev approved these changes on 2025-10-12

ggerganov merged ea174c62 into master 156 days ago

ggerganov deleted the sync-ggml-25-11-12 branch 156 days ago

Reviewers

danbev

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

whisper.cpp sync : ggml #3470 Merged

sync : ggml #3470

whisper.cpp
sync : ggml
#3470

Merged