onnxruntime
7a9a6bce
- Improve TopP sampling (#14192)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
3 years ago
Improve TopP sampling (#14192) ### Description Improve TopP sampling's filter kernel with cub::scan. It reduces TopP sampling latency from 3.67 to 0.92 for batch size 8 and vocabulary size 51k.
References
#14192 - Improve TopP sampling
Author
yufenglee
Parents
d92c663f
Loading