ggml-webgpu: add vectorized flash attention #20709
naive vectorized version
976ebc63
add vectorized flash attention
94abbac1
update vec version
10330856
remove unused path and shader
c307a4bf
remove unused helper functions
f8e317c4
add comments
52709dd4
remove pad path
df6ef45a
ggml-webgpu: fix flash-attn vec nwg=1 path and tighten vec specializa…
838306f4
change back to vec4
d61ec8f2
enable multi split
042a1a56
enable vec path when:
b61e63d8
update flast_attn_vec_split.wgsl to reduce redundant workgroup barrie…
36027435
enable vec path for q4 and q8
356d6ff6
flash-attn vec nwg=1 fast path (skip tmp/reduce staging)
1ae041d4
use packed f16 K loads in flash-attn vec split
33a547e1
use packed f16 K loads in flash-attn vec split on host side
638c49b4
tune flash-attn vec f16 VEC_NE by head dim
0abac398
cleanup
83a42b36
cleanup
2595b1ac
keep host side clean
25096b9c
cleanup host side
3d6bfe02
change back to original host wait/submit behavior
68fa2726
formatting
5065dc67
reverted param-buffer pool r ecfactor
03d0625f
add helper functions
5dd2a4b1
ggml-webgpu: move flash-attn vec pipeline caching back into shader lib
1e0d856b
ggml-webgpu: remove duplicate functions
88bf3525
ggml-webgpu: reserve flash-attn vec scratch in dst buffer allocation
5c2fefea
ggml-webgpu: revert unrelated change
59aa7d88
ggml-webgpu: revert deleted comment
cac85006
Merge branch 'master' into backup/subgroup_size_agnostic_rebased_ggml…
ff11e389
disable uniformity check
4e0100bb
remove unnecessary change
56fee6e2
Update ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl
29c09c2e
Update ggml/src/ggml-webgpu/ggml-webgpu.cpp
f40c9e75
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub