llama.cpp
ggml webgpu: initial flashattention implementation
#18610
Merged

ggml webgpu: initial flashattention implementation #18610

reeselevine merged 9 commits into ggml-org:master from reeselevine:master
reeselevine
reeselevine FlashAttention (#13)
36b5e5cc
reeselevine Update to account for default kv cache padding
b6c86244
github-actions github-actions added ggml
reeselevine formatting shader
e5bf2d5f
reeselevine reeselevine force pushed from ab90db0f to e5bf2d5f 34 days ago
jeffbolznv
reeselevine
reeselevine reeselevine requested a review from ggerganov ggerganov 33 days ago
reeselevine reeselevine requested a review from jeffbolznv jeffbolznv 33 days ago
reeselevine
ggerganov
reeselevine Add workflow for ggml-ci webgpu
e01f7850
reeselevine reeselevine requested a review from CISC CISC 32 days ago
reeselevine Try passing absolute path to dawn in ggml-ci
e725774e
github-actions github-actions added devops
reeselevine Avoid error on device destruction, add todos for proper cleanup
1eb1588c
reeselevine Fix unused warning
86c0da6c
reeselevine Forgot one parameter unused
286596a8
reeselevine Move some flashattn computation to f32 for correctness
d8d9a1e4
reeselevine
ggerganov
ggerganov
ggerganov
ggerganov approved these changes on 2026-01-08
CISC
CISC approved these changes on 2026-01-08
reeselevine
reeselevine reeselevine merged 15bff84b into master 31 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone