sync : ggml #3840

ggerganov merged 36 commits into master from sync-ggml-26-05-29
ggerganov
am17an CUDA: add fast walsh-hadamard transform (llama/23615)
93feed1a
forforever73 metal : add apple device id (llama/23566)
4f4616f6
JohannesGaessler CUDA: missing PDL sync for FWHT, better fallback (llama/23690)
e1617c53
nikhilJain17 Check batch_compute_passes before sending passes when not doing GPU p…
fcb5d203
yomaytk ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MU…
055c98dc
sanmai SYCL: implement ggml_sycl_pool_vmm (llama/22862)
2804f2e8
max-krasnyansky hexagon: add support for CONCAT op (llama/23648)
379dd558
jeffbolznv vulkan: optimize conv2d and implement coopmat1 support (llama/22620)
f6daf9df
truecoder34 ggml-zendnn : fixed naming of matmul function (llama/20964)
54e3b501
winstonma vulkan: avoid preferring transfer queue on AMD UMA devices (llama/22455)
b7a75c4c
ORippler CUDA: restrict PDL to CTK >= 12.3 due to MSVC issues (llama/23742)
15dfd80c
l8bloom vulkan: add REPEAT op support for f16 to f16. (llama/23298)
c0db79f1
jeffbolznv vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul …
97c34fad
TheBlueMatt vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 (llama/22887)
0a668d3e
yomaytk ggml-webgpu: Fix how to dispatch WG to some ops (llama/23750)
c1955d29
max-krasnyansky hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (llama/23647)
f421cccd
reeselevine ggml-webgpu: remove legacy constants (llama/23672)
83a81ea4
ymcki opencl: OP_GATED_DELTA_NET (llama/23312)
65f003cb
ymcki Hexagon: OP_GATED_DELTA_NET K>1 support (llama/23531)
0ef8dcbd
martin-klacer-arm ggml: fixed Arm SVE usage bug in vec.h, vec.cpp (llama/22841)
89ddd831
fairydreaming cuda : fix KQ mask offset integer overflow in fattn MMA kernel (llama…
5b203a2e
winstonma vulkan: Fix memory logger unsafe iterator access (llama/23667)
05112efd
winstonma vulkan: fix wrong index variable in inner loop (llama/23665)
192e0268
jeffbolznv vulkan: fast path for walsh-hadamard transform (llama/23687)
6b1d8244
max-krasnyansky hexagon: minor refresh for HMX FA and MM (llama/23796)
f2fa4aec
jadenmach2 CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ll…
0f8257df
yaohengxu mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for ……
b09043d9
fl0rianr ggml: auto apply iGPU flag CUDA/HIP if integrated device (llama/23007)
2fa11435
lhez opencl: move backend info printing into its own function (llama/23702)
e85d51ee
max-krasnyansky hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (lla…
d7dcb53e
TheBlueMatt meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear …
06ab7111
aendk cuda : disables launch_fattn PDL enrollment due to compiler bug (llam…
1c9c8f8a
ggerganov sync : ggml
159ce754
ggerganov talk-llama : sync llama.cpp
c42d328f
ggerganov ggml : bump version to 0.13.1 (ggml/1523)
3ff1ccd0
ggerganov sync : ggml
6d609ea0
ggerganov ggerganov merged f24588a2 into master 21 days ago
ggerganov ggerganov deleted the sync-ggml-26-05-29 branch 21 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone