onnxruntime
5f087c41 - optimized qmoe code path for 1 token (#27383)

Commit
10 days ago
optimized qmoe code path for 1 token (#27383) avoids gpu -> cpu copy in qmoe and removes 1 of 6 shaders in qmoe. This improves token generation on gpt-oss-20b by ~15% --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Author
Parents
Loading