onnxruntime
cd4ac494 - [webgpu] Enable indirect dispatch for flash attention (#26207)

Commit

68 days ago

[webgpu] Enable indirect dispatch for flash attention (#26207) This pull request introduces support for indirect dispatch in the WebGPU FlashAttention implementation, enabling more dynamic and efficient kernel launches based on runtime sequence lengths. The changes add new logic and parameters to propagate sequence length information and indirect dispatch buffers through the attention pipeline, with conditional code paths to maintain compatibility with the existing direct dispatch approach. It's part of the work to enable graph capture in phi4 https://github.com/microsoft/onnxruntime/pull/25868

References

#26207 - [webgpu] Enable indirect dispatch for flash attention

Author

qjia7

Parents

94de31fa

onnxruntime cd4ac494 - [webgpu] Enable indirect dispatch for flash attention (#26207)

onnxruntime
cd4ac494 - [webgpu] Enable indirect dispatch for flash attention (#26207)