llama.cpp
ggml-webgpu: add vectorized flash attention
#20709
Open

ggml-webgpu: add vectorized flash attention #20709

ArberSephirotheca
ArberSephirotheca naive vectorized version
976ebc63
ArberSephirotheca add vectorized flash attention
94abbac1
ArberSephirotheca update vec version
10330856
ArberSephirotheca remove unused path and shader
c307a4bf
ArberSephirotheca remove unused helper functions
f8e317c4
ArberSephirotheca add comments
52709dd4
ArberSephirotheca remove pad path
df6ef45a
ArberSephirotheca ggml-webgpu: fix flash-attn vec nwg=1 path and tighten vec specializa…
838306f4
ArberSephirotheca change back to vec4
d61ec8f2
ArberSephirotheca enable multi split
042a1a56
ArberSephirotheca enable vec path when:
b61e63d8
ArberSephirotheca update flast_attn_vec_split.wgsl to reduce redundant workgroup barrie…
36027435
ArberSephirotheca enable vec path for q4 and q8
356d6ff6
ArberSephirotheca flash-attn vec nwg=1 fast path (skip tmp/reduce staging)
1ae041d4
ArberSephirotheca use packed f16 K loads in flash-attn vec split
33a547e1
ArberSephirotheca use packed f16 K loads in flash-attn vec split on host side
638c49b4
ArberSephirotheca tune flash-attn vec f16 VEC_NE by head dim
0abac398
ArberSephirotheca cleanup
83a42b36
ArberSephirotheca cleanup
2595b1ac
ArberSephirotheca keep host side clean
25096b9c
ArberSephirotheca cleanup host side
3d6bfe02
ArberSephirotheca change back to original host wait/submit behavior
68fa2726
ArberSephirotheca formatting
5065dc67
ArberSephirotheca reverted param-buffer pool r ecfactor
03d0625f
ArberSephirotheca add helper functions
5dd2a4b1
ArberSephirotheca ArberSephirotheca requested a review 15 days ago
github-actions github-actions added ggml
github-actions github-actions added WebGPU
reeselevine
reeselevine commented on 2026-03-18
ArberSephirotheca
ArberSephirotheca ggml-webgpu: move flash-attn vec pipeline caching back into shader lib
1e0d856b
ArberSephirotheca ggml-webgpu: remove duplicate functions
88bf3525
ArberSephirotheca ggml-webgpu: reserve flash-attn vec scratch in dst buffer allocation
5c2fefea
ArberSephirotheca ggml-webgpu: revert unrelated change
59aa7d88
ArberSephirotheca ggml-webgpu: revert deleted comment
cac85006
reeselevine
ArberSephirotheca Merge branch 'master' into backup/subgroup_size_agnostic_rebased_ggml…
ff11e389
ArberSephirotheca disable uniformity check
4e0100bb
ArberSephirotheca remove unnecessary change
56fee6e2
reeselevine
reeselevine commented on 2026-04-02
reeselevine Update ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl
29c09c2e
reeselevine Update ggml/src/ggml-webgpu/ggml-webgpu.cpp
f40c9e75

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone