sync : ggml #3526

ggerganov merged 51 commits into master from sync-ggml-25-11-17
ggerganov
0cc4m vulkan: fix memory allocations (llama/17122)
6e2d45a4
Acly cuda/vulkan : bicubic interpolation (llama/17022)
ce8d1da2
fj-y-saito arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_do…
5c643599
ggerganov metal : enable tensor API for A19 (llama/17087)
4cd5695c
0cc4m vulkan: fix validation issue introduced by #16868 (llama/17145)
a64712e9
0cc4m vulkan: check glslc executable string (llama/17144)
d1a83fbf
angt ggml-cpu : inspect -march and -mcpu to found the CPU (llama/16333)
e4c1e3cd
ggerganov metal : cap threadgroups size of set_rows (llama/17146)
4413a561
max-krasnyansky cpu: skip NOPs to avoid barriers (llama/17133)
becc46e7
lhez opencl: add fastdiv and use it in set_rows, ported from cuda (llama/1…
485e4235
furrysalamander cmake : add version to all shared object files (llama/17091)
bee75186
chaxu01 kleidiai: add optimized per-channel kernels for Q8_0 (llama/16993)
2fe28b67
duduta ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (llama/1…
abbb5f2a
ixgbe ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conver…
f52e7c75
netrunnereve disable rms norm mul rope for chips with no fp16 rte (llama/17134)
c3a1298c
max-krasnyansky hexagon: various Op fixes (llama/17135)
32d1b349
NeoZhangJianyu fix ci crash about SSM_CONV (llama/17169)
e9df9581
TecJesh CANN: Add L2_NORM op support (llama/16856)
2f2c6c3b
Alcpz ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030)
6a2c71b9
ggerganov ggml : use std::sort in ggml_argsort CPU implementation (llama/17211)
a541b0ef
JohannesGaessler CUDA: static assert to prevent misuse of memcpy_1 (llama/17198)
214d1af0
am17an CUDA: fuse rope + set_rows (llama/16884)
be4d1303
TecJesh CANN: Add cross_entropy_loss op support (llama/16886)
c880b430
slaren ggml-cpu : use template for argsort (llama/17222)
9808706d
ggerganov Revert "ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030)" …
b6d0ebe2
bghira metal: accelerated conv2d (llama/17175)
5150c23e
ixgbe ggml-cpu : add RISC-V vector intrinsic support for silu and cvar oper…
273dd3fe
slaren sched : fix reserve ignoring user tensor assignments (llama/17232)
312480c9
0cc4m vulkan: remove shell call from vulkan-shaders-gen tool, revert file c…
1b4c6ad1
pwilkin ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (llama/17063)
e9b37f56
Alcpz ggml-cpu: handle 3d tensors in repack mat_mul (llama/17241)
3e4ae291
ggerganov metal : make the FA extra sizes consistent (llama/17143)
ae08083e
ggerganov metal : support argsort for ne00 > 1024 (llama/17247)
a6f1d807
jeffbolznv vulkan: change graph_compute to be async and enable get_tensor_async …
786e0056
jeffbolznv vulkan: skip all-negative-inf blocks in FA (llama/17186)
9d3fa94c
jeffbolznv vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (llama/…
a175f857
giuseppe vulkan: implement ABS and NEG (llama/17245)
89f82bfe
0cc4m vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AM…
5ae41738
jeffbolznv vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (llama/17287)
5d9fba0a
shani-f sycl : unify unary kernels with a generic implementation and enable w…
d7356143
shaofeiqi opencl: add kernel to handle mat mul in attention to improve encoding…
14dac59d
lhez opencl: fix rms_norm_mul (llama/17250)
9c2bde0f
ggerganov metal : remove obosolete asserts (llama/17295)
844275a8
0cc4m vulkan: fix MMQ quantize_y condition (llama/17301)
75cfe4a6
zayac vulkan: add LOG operation support for F32 and F16 (llama/17183)
4f694e4f
hipudding CANN: Use smart pointers to manage ACL objects (llama/17238)
7e090958
ggerganov metal : add cumsum (llama/17305)
25182a79
ggerganov metal : faster argsort (llama/17315)
8208359a
ggerganov metal : support I32 -> I32 copy (llama/17317)
714c1ba1
ggerganov sync : ggml
36b80f63
ggerganov sync : llama.cpp
3e980fd5
danbev
danbev approved these changes on 2025-11-17
ggerganov ggerganov merged b12abefa into master 57 days ago
ggerganov ggerganov deleted the sync-ggml-25-11-17 branch 57 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone