onnxruntime
7cc93cf4 - [webgpu] Apply Flash Attention if sliding window exceeds KV cache length (#25594)

Commit

359 days ago

[webgpu] Apply Flash Attention if sliding window exceeds KV cache length (#25594) ### Description  #25372 adds sliding window support for Group Query Attention, disabling Flash Attention as it's not yet supported. This PR adds a check for the sliding window and applies Flash Attention when the window size exceeds the KV cache length or total sequence length. ### Motivation and Context See above.

References

#25594 - [webgpu] Apply Flash Attention if sliding window exceeds KV cache length

Author

daijh

Parents

a120b4bf

onnxruntime 7cc93cf4 - [webgpu] Apply Flash Attention if sliding window exceeds KV cache length (#25594)

onnxruntime
7cc93cf4 - [webgpu] Apply Flash Attention if sliding window exceeds KV cache length (#25594)