onnxruntime
83d11b53 - [WebNN] Support more features for GQA (#27234)

Commit
12 hours ago
[WebNN] Support more features for GQA (#27234) Add support for GroupQueryAttention with: - do_rotary=true (cos_cache/sin_cache inputs) - Packed QKV (optional key/value inputs) - Optional past_key/past_value for prefill mode - Remove fp16->fp32 casting workaround Add ApplyRotaryEmbedding helper function. Fix decode stage by using qkv_sequence_length to distinguish prefill vs decode, and use runtime seqlens_k instead of static past_sequence_length for rotary position calculation.
Author
Parents
Loading