Add GQA fusion for CUDA EP (#24335)

Commit

282 days ago

Add GQA fusion for CUDA EP (#24335) ### Description  Most models can benefit from fusing the pre-GQA nodes into a single MatMul or MatMulNBits. This change will detect the patterns possible to fuse and execute the fusion on CUDA EPs. ### Motivation and Context This will enable publishing of a single GPU model going forward. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

References

#24335 - Add GQA fusion for CUDA EP

Author

nenad1002

Parents

171e297f

onnxruntime 9ab6b875 - Add GQA fusion for CUDA EP (#24335)

onnxruntime
9ab6b875 - Add GQA fusion for CUDA EP (#24335)