PR #3840 sync : ggml - SemanticDiff

sync : ggml #3840

ggerganov merged 36 commits into master from sync-ggml-26-05-29

CUDA: add fast walsh-hadamard transform (llama/23615)

93feed1a

metal : add apple device id (llama/23566)

4f4616f6

CUDA: missing PDL sync for FWHT, better fallback (llama/23690)

e1617c53

Check batch_compute_passes before sending passes when not doing GPU p…

fcb5d203

ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MU…

055c98dc

SYCL: implement ggml_sycl_pool_vmm (llama/22862)

2804f2e8

hexagon: add support for CONCAT op (llama/23648)

379dd558

vulkan: optimize conv2d and implement coopmat1 support (llama/22620)

f6daf9df

ggml-zendnn : fixed naming of matmul function (llama/20964)

54e3b501

vulkan: avoid preferring transfer queue on AMD UMA devices (llama/22455)

b7a75c4c

CUDA: restrict PDL to CTK >= 12.3 due to MSVC issues (llama/23742)

15dfd80c

vulkan: add REPEAT op support for f16 to f16. (llama/23298)

c0db79f1

vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul …

97c34fad

vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 (llama/22887)

0a668d3e

ggml-webgpu: Fix how to dispatch WG to some ops (llama/23750)

c1955d29

hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (llama/23647)

f421cccd

ggml-webgpu: remove legacy constants (llama/23672)

83a81ea4

opencl: OP_GATED_DELTA_NET (llama/23312)

65f003cb

Hexagon: OP_GATED_DELTA_NET K>1 support (llama/23531)

0ef8dcbd

ggml: fixed Arm SVE usage bug in vec.h, vec.cpp (llama/22841)

89ddd831

cuda : fix KQ mask offset integer overflow in fattn MMA kernel (llama…

5b203a2e

vulkan: Fix memory logger unsafe iterator access (llama/23667)

05112efd

vulkan: fix wrong index variable in inner loop (llama/23665)

192e0268

vulkan: fast path for walsh-hadamard transform (llama/23687)

6b1d8244

hexagon: minor refresh for HMX FA and MM (llama/23796)

f2fa4aec

CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ll…

0f8257df

mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for ……

b09043d9

ggml: auto apply iGPU flag CUDA/HIP if integrated device (llama/23007)

2fa11435

opencl: move backend info printing into its own function (llama/23702)

e85d51ee

hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (lla…

d7dcb53e

meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear …

06ab7111

cuda : disables launch_fattn PDL enrollment due to compiler bug (llam…

1c9c8f8a

sync : ggml

159ce754

talk-llama : sync llama.cpp

c42d328f

ggml : bump version to 0.13.1 (ggml/1523)

3ff1ccd0

sync : ggml

6d609ea0

ggerganov merged f24588a2 into master 21 days ago

ggerganov deleted the sync-ggml-26-05-29 branch 21 days ago

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

whisper.cpp sync : ggml #3840 Merged

sync : ggml #3840

whisper.cpp
sync : ggml
#3840

Merged