onnxruntime
cd4ac494 - [webgpu] Enable indirect dispatch for flash attention (#26207)

Commit
68 days ago
[webgpu] Enable indirect dispatch for flash attention (#26207) This pull request introduces support for indirect dispatch in the WebGPU FlashAttention implementation, enabling more dynamic and efficient kernel launches based on runtime sequence lengths. The changes add new logic and parameters to propagate sequence length information and indirect dispatch buffers through the attention pipeline, with conditional code paths to maintain compatibility with the existing direct dispatch approach. It's part of the work to enable graph capture in phi4 https://github.com/microsoft/onnxruntime/pull/25868
Author
Parents
Loading