onnxruntime
bdf8dc2c - [WebNN EP] Support local attention feature for GQA (#26565)

Commit
153 days ago
[WebNN EP] Support local attention feature for GQA (#26565) ### Description <!-- Describe your changes. --> Support the `local_window_size` attribute in **GroupQueryAttention** Operator, which is designed for sliding window attention and may influence the attention mask pattern. For local window size not equal to -1, new attention mask pattern will be created as follows for applying sliding window. ``` condition_1 (old attn_mask) ---> CumSum (axis=3, exclusive=true, reversed=true) | | | Lesser <--- local_window_size | | LogicalAnd <----------------- condition_2 | new attn_mask ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Author
Parents
Loading