onnxruntime
bdf8dc2c - [WebNN EP] Support local attention feature for GQA (#26565)

Commit

243 days ago

[WebNN EP] Support local attention feature for GQA (#26565) ### Description  Support the `local_window_size` attribute in **GroupQueryAttention** Operator, which is designed for sliding window attention and may influence the attention mask pattern. For local window size not equal to -1, new attention mask pattern will be created as follows for applying sliding window. ``` condition_1 (old attn_mask) ---> CumSum (axis=3, exclusive=true, reversed=true) | | | Lesser <--- local_window_size | | LogicalAnd <----------------- condition_2 | new attn_mask ``` ### Motivation and Context

References

#26565 - [WebNN EP] Support local attention feature for GQA

Author

peishenyan

Parents

ff0715d3

onnxruntime bdf8dc2c - [WebNN EP] Support local attention feature for GQA (#26565)

onnxruntime
bdf8dc2c - [WebNN EP] Support local attention feature for GQA (#26565)