onnxruntime
7a9a6bce - Improve TopP sampling (#14192)

Commit
3 years ago
Improve TopP sampling (#14192) ### Description Improve TopP sampling's filter kernel with cub::scan. It reduces TopP sampling latency from 3.67 to 0.92 for batch size 8 and vocabulary size 51k.
Author
Parents
Loading