llama.cpp
ggml-webgpu: updated matrix-vector multiplication
#21738

Merged

ggml-webgpu: updated matrix-vector multiplication #21738

reeselevine merged 10 commits into ggml-org:master from reeselevine:k_quant_speedup

merged properly, but slow q3_k and q5_k with u32 indexing

3c36b556

neha-ha requested a review from

ggerganov 72 days ago

neha-ha requested a review 72 days ago

github-actions added ggml

github-actions added WebGPU

Start on new mat-vec

3c9e474c

New format float paths working

0bcf75c1

Working q4_0

01bd9127

Work on remaining legacy q-types

f839c103

port k-quants to new matvec

ba961225

remove old shader

b4b6ffc4

Merge remote-tracking branch 'upstream/master' into k_quant_speedup

83a0d381

reeselevine force pushed from 41259410 to 83a0d381 65 days ago

Remove old constants, format

ca49e73a

reeselevine approved these changes on 2026-04-17

reeselevine requested a review from

CISC 65 days ago

reeselevine added merge ready

CISC approved these changes on 2026-04-17

remove accidental file

b92011ef

reeselevine approved these changes on 2026-04-19

ggerganov approved these changes on 2026-04-20

reeselevine merged a6cc43c2 into master 62 days ago

Reviewers

ggerganov

reeselevine

CISC

Assignees

No one assigned

Labels

ggml merge ready WebGPU

Milestone

No milestone

llama.cpp ggml-webgpu: updated matrix-vector multiplication #21738 Merged

ggml-webgpu: updated matrix-vector multiplication #21738

llama.cpp
ggml-webgpu: updated matrix-vector multiplication
#21738

Merged