sync : ggml #3470

ggerganov merged 36 commits into master from sync-ggml-25-11-12
ggerganov
lhez opencl: support ne3 in get_rows (llama/15866)
69943f8b
reeselevine ggml webgpu: support for rope,div,sub,glu,scale,cont operators (llama…
3a5a3546
lhez opencl: support pad_ext (llama/15888)
a57c9f69
netrunnereve vulkan: make ggml_vk_default_dispatcher support older vulkan headers …
032abbcc
IMbackK HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.…
4d08090e
yeahdongcn musa: update compile flags (llama/16265)
3b2df32a
pwilkin model : Apertus model implementation (llama/15852)
5cd34243
reeselevine ggml webgpu: add support for soft_max, optimize rms_norm (llama/16357)
cc6dc14e
jeffbolznv vulkan: in flash attention, bounds check against nem1 (don't rely on …
3f2ecffc
jeffbolznv vulkan: Fix FA coopmat1 invalid array indexing (llama/16365)
fe538c22
jeffbolznv vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (ll…
2e235913
Acly ggml : fix graph reallocation with multiple chunks (llama/16396)
3d3000fb
ggerganov metal : fix loop bound in ggml_mem_ranges (llama/16412)
5f895996
Acly vulkan : incremental shader builds (llama/16341)
75159b53
rgerganov rpc : add support for multiple devices (llama/16276)
0c56ec3e
rgerganov rpc : check src buffer when copying tensor (llama/16421)
98b549d5
netrunnereve vulkan: use a more appropriate amount of threads when generating shad…
6e7e1b8d
reeselevine ggml webgpu: actually add softmax, fix rms_norm offset (llama/16400)
72b9fa00
danbev ggml-cpu : fix leftover handling in ggml_vec_scale_f32 for SVE (llama…
73265c03
ggerganov ggml : fix unaligned access in AMX code (llama/16315)
352a07a2
ggerganov metal : various optimizations + refactoring (llama/16446)
389681e7
ggerganov tests : add -INF blocks to the KQ mask in the FA tests (llama/16380)
091a5c11
ggerganov metal : add support for non-padded FA KV (llama/16148)
d75f9ae9
reeselevine ggml webgpu: profiling, CI updates, reworking of command submission (…
c8d88fc2
ggerganov metal : mark FA blocks (llama/16372)
1b7b1200
ai-fonsi Disable CUDA host buffers on integrated GPUs (llama/16308)
57d8e6b1
NeoZhangJianyu refactor soft_max, add soft_max_back (llama/16472)
73b3339f
chaxu01 kleidiai: kernel interface refactoring (llama/16460)
ba2e955f
noemotiovon CANN: Improve ACL graph matching (llama/16166)
910395c5
duduta cpu : optimize the ggml NORM operation (llama/15953)
779ca59c
mehendarkarprajwal cmake : Dont define XOPENSOURCE on AIX (llama/16481)
667e3645
slaren cuda : avoid initializing unused devices (llama/16510)
d4775054
ggerganov metal : fix mul-mm condition + fix mul-mv permuted kernels (llama/16494)
33f78624
ggerganov sync : ggml
4f776684
ggerganov talk-llama : sync llama.cpp
2ad7a695
ggerganov bench : update [no ci]
55d8f017
danbev
danbev approved these changes on 2025-10-12
ggerganov ggerganov merged ea174c62 into master 156 days ago
ggerganov ggerganov deleted the sync-ggml-25-11-12 branch 156 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone