CUDA: add fast walsh-hadamard transform (llama/23615)
93feed1a
metal : add apple device id (llama/23566)
4f4616f6
CUDA: missing PDL sync for FWHT, better fallback (llama/23690)
e1617c53
Check batch_compute_passes before sending passes when not doing GPU p…
fcb5d203
ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MU…
055c98dc
SYCL: implement ggml_sycl_pool_vmm (llama/22862)
2804f2e8
hexagon: add support for CONCAT op (llama/23648)
379dd558
vulkan: optimize conv2d and implement coopmat1 support (llama/22620)
f6daf9df
ggml-zendnn : fixed naming of matmul function (llama/20964)
54e3b501
vulkan: avoid preferring transfer queue on AMD UMA devices (llama/22455)
b7a75c4c
CUDA: restrict PDL to CTK >= 12.3 due to MSVC issues (llama/23742)
15dfd80c
vulkan: add REPEAT op support for f16 to f16. (llama/23298)
c0db79f1
vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul …
97c34fad
vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 (llama/22887)
0a668d3e
ggml-webgpu: Fix how to dispatch WG to some ops (llama/23750)
c1955d29
hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (llama/23647)
f421cccd
ggml-webgpu: remove legacy constants (llama/23672)
83a81ea4
opencl: OP_GATED_DELTA_NET (llama/23312)
65f003cb
Hexagon: OP_GATED_DELTA_NET K>1 support (llama/23531)
0ef8dcbd
ggml: fixed Arm SVE usage bug in vec.h, vec.cpp (llama/22841)
89ddd831
cuda : fix KQ mask offset integer overflow in fattn MMA kernel (llama…
5b203a2e
vulkan: Fix memory logger unsafe iterator access (llama/23667)
05112efd
vulkan: fix wrong index variable in inner loop (llama/23665)
192e0268
vulkan: fast path for walsh-hadamard transform (llama/23687)
6b1d8244
hexagon: minor refresh for HMX FA and MM (llama/23796)
f2fa4aec
CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ll…
0f8257df
mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for ……
b09043d9
ggml: auto apply iGPU flag CUDA/HIP if integrated device (llama/23007)
2fa11435
opencl: move backend info printing into its own function (llama/23702)
e85d51ee
hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (lla…
d7dcb53e
meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear …
06ab7111
cuda : disables launch_fattn PDL enrollment due to compiler bug (llam…
1c9c8f8a
sync : ggml
159ce754
talk-llama : sync llama.cpp
c42d328f
ggml : bump version to 0.13.1 (ggml/1523)
3ff1ccd0
sync : ggml
6d609ea0
ggerganov
merged
f24588a2
into master 21 days ago
ggerganov
deleted the sync-ggml-26-05-29 branch 21 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub