[webgpu] Enable indirect dispatch for flash attention (#26207)
This pull request introduces support for indirect dispatch in the WebGPU
FlashAttention implementation, enabling more dynamic and efficient
kernel launches based on runtime sequence lengths. The changes add new
logic and parameters to propagate sequence length information and
indirect dispatch buffers through the attention pipeline, with
conditional code paths to maintain compatibility with the existing
direct dispatch approach.
It's part of the work to enable graph capture in phi4
https://github.com/microsoft/onnxruntime/pull/25868