onnxruntime
722743c0 - Add MHA fusion for Nemotron speech conformer encoder (#27764)

Commit
31 days ago
Add MHA fusion for Nemotron speech conformer encoder (#27764) ### Description This PR updates the pattern matchings to perform multi-head attention fusion for the conformer encoder inside [Nemotron speech](https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b). <img width="550" height="976" alt="image" src="https://github.com/user-attachments/assets/a194308e-ce69-4128-9389-aae2a64b312f" /> ### Motivation and Context These changes allow the `MultiHeadAttention` op to appear in the encoder ONNX model. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Parents
Loading