llama.cpp
ggml-webgpu: improve flastAttention performance by software pipelining
#19151
Merged

ggml-webgpu: improve flastAttention performance by software pipelining #19151

ArberSephirotheca
ArberSephirotheca webgpu : pipeline flash_attn Q/K loads in WGSL
84ceaca6
ArberSephirotheca ggml-webgpu: unroll Q*K accumlation inner loop
01db8b60
ArberSephirotheca ggml-webgpu: vectorization
57887189
ArberSephirotheca ggml-webgpu: unrolling
e302f1f1
ArberSephirotheca ggml-webgpu: remove redundant unrolling
cfdce2a0
ArberSephirotheca ggml-webgpu: restore the config
eaa26c63
ArberSephirotheca ggml-webgpu: remove redundant comments
8f2daee1
ArberSephirotheca ggml-webgpu: formatting
2bd304ff
ArberSephirotheca ArberSephirotheca requested a review from reeselevine reeselevine 38 days ago
reeselevine
reeselevine commented on 2026-01-28
ArberSephirotheca ggml-webgpu: formatting and remove vectorization
17eee16d
ArberSephirotheca ggml-webgpu: remove unnecessary constants
faa9a76c
github-actions github-actions added ggml
reeselevine
reeselevine commented on 2026-01-28
ArberSephirotheca ggml-webgpu: change QKV buffer to read_write to pass validation
178f85c4
ArberSephirotheca ggml-webgpu: add explanation for the additional bracket around Q K ac…
243a299d
reeselevine Indentation and for -> if for tail
29f0b88b
reeselevine
reeselevine approved these changes on 2026-01-29
reeselevine Merge remote-tracking branch 'upstream/master' into zheyuan-fa
e6f15ca4
reeselevine
jeffbolznv
reeselevine Kick off CI on wgsl only commits
a150993a
reeselevine reeselevine requested a review from CISC CISC 36 days ago
reeselevine
github-actions github-actions added devops
reeselevine reeselevine merged bd90fc74 into master 36 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone