onnxruntime
7cc93cf4 - [webgpu] Apply Flash Attention if sliding window exceeds KV cache length (#25594)

Commit
146 days ago
[webgpu] Apply Flash Attention if sliding window exceeds KV cache length (#25594) ### Description <!-- Describe your changes. --> #25372 adds sliding window support for Group Query Attention, disabling Flash Attention as it's not yet supported. This PR adds a check for the sliding window and applies Flash Attention when the window size exceeds the KV cache length or total sequence length. ### Motivation and Context See above.
Author
Parents
Loading