onnxruntime
fe4b6550 - [WebNN] Improve MultiHeadAttention op implementation (#27494)

Commit
21 hours ago
[WebNN] Improve MultiHeadAttention op implementation (#27494) - Remove additional FP32 cast nodes and let underlayer backends to handle the precision issues - Fix a bug when checking tensor existence for the attention_bias input - Some other minor improvements
Author
Parents
Loading