vulkan: fix memory allocations (llama/17122)
6e2d45a4
cuda/vulkan : bicubic interpolation (llama/17022)
ce8d1da2
arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_do…
5c643599
metal : enable tensor API for A19 (llama/17087)
4cd5695c
vulkan: fix validation issue introduced by #16868 (llama/17145)
a64712e9
vulkan: check glslc executable string (llama/17144)
d1a83fbf
ggml-cpu : inspect -march and -mcpu to found the CPU (llama/16333)
e4c1e3cd
metal : cap threadgroups size of set_rows (llama/17146)
4413a561
cpu: skip NOPs to avoid barriers (llama/17133)
becc46e7
opencl: add fastdiv and use it in set_rows, ported from cuda (llama/1…
485e4235
cmake : add version to all shared object files (llama/17091)
bee75186
kleidiai: add optimized per-channel kernels for Q8_0 (llama/16993)
2fe28b67
ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (llama/1…
abbb5f2a
ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conver…
f52e7c75
disable rms norm mul rope for chips with no fp16 rte (llama/17134)
c3a1298c
hexagon: various Op fixes (llama/17135)
32d1b349
fix ci crash about SSM_CONV (llama/17169)
e9df9581
CANN: Add L2_NORM op support (llama/16856)
2f2c6c3b
ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030)
6a2c71b9
ggml : use std::sort in ggml_argsort CPU implementation (llama/17211)
a541b0ef
CUDA: static assert to prevent misuse of memcpy_1 (llama/17198)
214d1af0
CUDA: fuse rope + set_rows (llama/16884)
be4d1303
CANN: Add cross_entropy_loss op support (llama/16886)
c880b430
ggml-cpu : use template for argsort (llama/17222)
9808706d
Revert "ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030)" …
b6d0ebe2
metal: accelerated conv2d (llama/17175)
5150c23e
ggml-cpu : add RISC-V vector intrinsic support for silu and cvar oper…
273dd3fe
sched : fix reserve ignoring user tensor assignments (llama/17232)
312480c9
vulkan: remove shell call from vulkan-shaders-gen tool, revert file c…
1b4c6ad1
ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (llama/17063)
e9b37f56
ggml-cpu: handle 3d tensors in repack mat_mul (llama/17241)
3e4ae291
metal : make the FA extra sizes consistent (llama/17143)
ae08083e
metal : support argsort for ne00 > 1024 (llama/17247)
a6f1d807
vulkan: change graph_compute to be async and enable get_tensor_async …
786e0056
vulkan: skip all-negative-inf blocks in FA (llama/17186)
9d3fa94c
vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (llama/…
a175f857
vulkan: implement ABS and NEG (llama/17245)
89f82bfe
vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AM…
5ae41738
vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (llama/17287)
5d9fba0a
sycl : unify unary kernels with a generic implementation and enable w…
d7356143
opencl: add kernel to handle mat mul in attention to improve encoding…
14dac59d
opencl: fix rms_norm_mul (llama/17250)
9c2bde0f
metal : remove obosolete asserts (llama/17295)
844275a8
vulkan: fix MMQ quantize_y condition (llama/17301)
75cfe4a6
vulkan: add LOG operation support for F32 and F16 (llama/17183)
4f694e4f
CANN: Use smart pointers to manage ACL objects (llama/17238)
7e090958
metal : add cumsum (llama/17305)
25182a79
metal : faster argsort (llama/17315)
8208359a
metal : support I32 -> I32 copy (llama/17317)
714c1ba1
sync : ggml
36b80f63
sync : llama.cpp
3e980fd5
danbev
approved these changes
on 2025-11-17
ggerganov
merged
b12abefa
into master 57 days ago
ggerganov
deleted the sync-ggml-25-11-17 branch 57 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub