ggml-webgpu: improve flastAttention performance by software pipelining #19151
webgpu : pipeline flash_attn Q/K loads in WGSL
84ceaca6
ggml-webgpu: unroll Q*K accumlation inner loop
01db8b60
ggml-webgpu: vectorization
57887189
ggml-webgpu: unrolling
e302f1f1
ggml-webgpu: remove redundant unrolling
cfdce2a0
ggml-webgpu: restore the config
eaa26c63
ggml-webgpu: remove redundant comments
8f2daee1
ggml-webgpu: formatting
2bd304ff
ggml-webgpu: formatting and remove vectorization
17eee16d
ggml-webgpu: remove unnecessary constants
faa9a76c
ggml-webgpu: change QKV buffer to read_write to pass validation
178f85c4
ggml-webgpu: add explanation for the additional bracket around Q K ac…
243a299d
Indentation and for -> if for tail
29f0b88b
Merge remote-tracking branch 'upstream/master' into zheyuan-fa
e6f15ca4
Kick off CI on wgsl only commits
a150993a
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub