onnxruntime
9ab6b875 - Add GQA fusion for CUDA EP (#24335)

Commit
282 days ago
Add GQA fusion for CUDA EP (#24335) ### Description <!-- Describe your changes. --> Most models can benefit from fusing the pre-GQA nodes into a single MatMul or MatMulNBits. This change will detect the patterns possible to fuse and execute the fusion on CUDA EPs. ### Motivation and Context This will enable publishing of a single GPU model going forward. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Author
Parents
Loading