llama.cpp
ggml-webgpu: improve flastAttention performance by software pipelining
#19151

Merged

ggml-webgpu: improve flastAttention performance by software pipelining #19151

reeselevine merged 15 commits into ggml-org:master from ArberSephirotheca:zheyuan-fa

webgpu : pipeline flash_attn Q/K loads in WGSL

84ceaca6

ggml-webgpu: unroll Q*K accumlation inner loop

01db8b60

ggml-webgpu: vectorization

57887189

ggml-webgpu: unrolling

e302f1f1

ggml-webgpu: remove redundant unrolling

cfdce2a0

ggml-webgpu: restore the config

eaa26c63

ggml-webgpu: remove redundant comments

8f2daee1

ggml-webgpu: formatting

2bd304ff

ArberSephirotheca requested a review from

reeselevine 38 days ago

reeselevine commented on 2026-01-28

ggml-webgpu: formatting and remove vectorization

17eee16d

ggml-webgpu: remove unnecessary constants

faa9a76c

github-actions added ggml

reeselevine commented on 2026-01-28

ggml-webgpu: change QKV buffer to read_write to pass validation

178f85c4

ggml-webgpu: add explanation for the additional bracket around Q K ac…

243a299d

Indentation and for -> if for tail

29f0b88b

reeselevine approved these changes on 2026-01-29

Merge remote-tracking branch 'upstream/master' into zheyuan-fa

e6f15ca4

Kick off CI on wgsl only commits

a150993a

reeselevine requested a review from

CISC 36 days ago

github-actions added devops

reeselevine merged bd90fc74 into master 36 days ago

Reviewers

reeselevine

CISC

Assignees

No one assigned

Labels

devops ggml

Milestone

No milestone

llama.cpp ggml-webgpu: improve flastAttention performance by software pipelining #19151 Merged

ggml-webgpu: improve flastAttention performance by software pipelining #19151

llama.cpp
ggml-webgpu: improve flastAttention performance by software pipelining
#19151

Merged