onnxruntime
5f087c41
- optimized qmoe code path for 1 token (#27383)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
10 days ago
optimized qmoe code path for 1 token (#27383) avoids gpu -> cpu copy in qmoe and removes 1 of 6 shaders in qmoe. This improves token generation on gpt-oss-20b by ~15% --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
References
#27383 - optimized qmoe code path for 1 token
Author
guschmue
Parents
3db53eb0
Loading