onnxruntime
c9726045 - [Core] MobileClip Attention Fusion (#27883)

Commit
33 days ago
[Core] MobileClip Attention Fusion (#27883) ### Description Update the Attention Fusion optimizer to help fuse the Attention subgraph pattern in MobileClip model. The perf gain from this itself is paltry (mostly from not having to launch many kernels) but the real gain will be AFTER this fusion (i.e.) tuning the performance of the MHA kernel for the problem shapes seen in this model. There are 2 Attention blocks found in the model and this update fuses both of them. ### Motivation and Context Improve performance of MobileClip model --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Author
Parents
Loading