sync : ggml #3583

ggerganov merged 59 commits into master from sync-ggml-25-12-31
ggerganov
zhang-hui-yulo HIP: Refactor mma for RDNA and CDNA (llama/17990)
cde9758b
Alcpz ggml-cpu: ARM64: repack version of q8_0 (dotprod and i8mm) (llama/18096)
126deb39
joeldushouyu ggml-hexagon: gelu operation (llama/17921)
2d9e9d54
joeldushouyu ggml-hexagon: swiglu_oai operation (llama/18114)
013ebf19
zhang-hui-yulo remove i_major_dual (llama/18157)
b03ed79b
taimur-10x ggml-cpu: extend support for RVV floating-point kernels (llama/17318)
f9d36bd8
ngxson model : add ASR support for LFM2-Audio-1.5B (conformer) (llama/18106)
f09b84b0
jeffbolznv vulkan: Add perf logger mode with concurrency (llama/17944)
baa5c1db
ngdxzy ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for mor…
5a1735a7
Aadeshveer Added comments explaining thread block size selection logic based on …
1dd0dfee
lovedheart Vulkan: some improvement on mul_mat_iq2_xs (llama/18031)
216ddd8a
jeffbolznv vulkan: in graph_optimize, try to group ADD operations (llama/18060)
3218970f
jeffbolznv vulkan: support GGML_UNARY_OP_XIELU (llama/18062)
9a60d12d
jeffbolznv vulkan/cuda: fix topk_moe with exp_probs_b (llama/18071)
5506ddb2
jeffbolznv vulkan: fix im2col overflowing maxworkgroupcount (llama/18180)
e5d89abd
JohannesGaessler llama: fix RPC for -fit on (llama/18233)
092caa4c
jeffbolznv vulkan: Implement set_tensor_async and the event interfaces (llama/18…
129f9631
jeffbolznv vulkan: Extend rope fusions to allow mrope (llama/18264)
ad5a5115
lhez opencl: unpack q4_0 for adreno in get_tensor (llama/18278)
ca5e155b
taimur-10x llamafile: add rvv support for sgemm kernels (llama/18199)
06b4ca96
joeldushouyu ggml-hexagon: gelu optimization (llama/18151)
de7f933f
chraac ggml-hexagon: create generalized functions for cpu side op (llama/17500)
81f95754
struct rpc : add check for rpc buffer type (llama/18242)
f5789f82
TianHao324 CANN: Uses yarn_ramp cache in ROPE (llama/17725)
56edad8e
0cc4m vulkan: use fewer FA rows for small cache runs (llama/18280)
61872d57
wangweixuan CANN : refactor ACL graph cache (llama/17752)
de40463b
jeffbolznv vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (…
4137b226
am17an CUDA: experimental native mxfp4 support for blackwell (llama/17906)
32cc57d3
Aadeshveer ggml : optimize cuda cumsum fallback kernel (llama/18343)
17c42517
Intellouis CANN: Add support for CONV_TRANSPOSE_1D when kernel size > 255 (llama…
a7b7da6f
am17an ggml-cuda: fix blackwell native builds (llama/18361)
e1edaffb
am17an cuda: optimize cumsum cub path (llama/18362)
dc1099c2
am17an ggml-cuda: fix regex for arch list (llama/18371)
bc820e90
0Marble CANN: implement the SSM_CONV operator (llama/17737)
7c6e891b
jeffbolznv vulkan: handle rope with large number of rows (llama/18306)
a1cc693c
jeffbolznv vulkan: Support UPSCALE w/antialias (llama/18327)
09bd04f3
netrunnereve vulkan: small dequantization improvements (llama/18380)
b311303e
jeffbolznv vulkan: Use BK=32 for coopmat2 mul_mat_id (llama/18332)
d78f3150
jeffbolznv vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader (llama/18349)
f8dfe0ab
jeffbolznv vulkan: preprocess mul_mat_id experts and discard workgroups more qui…
d63cb2af
am17an ggml-cuda: Use same regex for GGML_NATIVE=OFF (llama/18407)
78b9d734
lhez opencl: allow resizing transpose buffers (llama/18384)
6a9983bf
QDelta ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATIVE=ON (l…
2971350e
bberberov cmake: Added more x86_64 CPU backends when building with `GGML_CPU_AL…
c5ef983c
o7si rpc: fix segfault on invalid endpoint format (llama/18387)
76efccb1
am17an Revert "ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATI…
d50bcb10
IMbackK HIP: Use mmq on MFMA devices for MUL_MAT_ID in cases where a lot of s…
60f59a1b
am17an cuda: fix race condition in cumsum (llama/18448)
dccb3be2
JohannesGaessler CUDA: Blackwell features for non-native builds (llama/18436)
4795bbf8
JohannesGaessler CUDA: fix replacment of bad archs in CMake (llama/18457)
a29fc211
am17an CUDA: add log line when mxfp4 acceleration is used (llama/18483)
db760ebb
chaxu01 kleidiai: add and integrate SVE 256-bit vector-length kernel (llama/1…
5f634957
rrsathe Work around broken IntelSYCLConfig.cmake in Intel oneAPI 2025.x (llam…
46246528
am17an sycl: add newline at the end of CMakeLists.txt (llama/18503)
a1780c7c
ggerganov metal : remove BF16 x F16 kernels (llama/18456)
3aad8322
JohannesGaessler CUDA: fix KQ max calculation (llama/18487)
60add562
gatbontonpc metal : add count_equal op (llama/18314)
94169eac
ggerganov sync : ggml
3b318610
ggerganov talk-llama : sync llama.cpp
9faea66e
danbev
danbev approved these changes on 2025-12-31
ggerganov ggerganov merged 7359ac94 into master 13 days ago
ggerganov ggerganov deleted the sync-ggml-25-12-31 branch 13 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone