onnxruntime
8ab27d9f - [webgpu] Don't use num_workgroups when use indirect dispatch (#26334)

Commit
202 days ago
[webgpu] Don't use num_workgroups when use indirect dispatch (#26334) This pull request updates the FlashAttention WebGPU implementation to improve support for indirect dispatch. The main changes ensure that when indirect dispatch is used, the shader receives the actual workgroup dimensions from an input buffer rather than relying on built-in variables, which avoids duplication overhead in Dawn/WebGPU. See https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/ComputePassEncoder.cpp;l=275. This PR fixes the issue that indirect dispatch is slower than normal dispatch for the same program. With this change, the phi4 with graph capture enabled can run 145 tps from 125 tps on NV 5080.
Author
Parents
Loading