onnxruntime
09e5724f - [CUDA] Fix beam search of num_beams > 32 (#23599)

Commit
348 days ago
[CUDA] Fix beam search of num_beams > 32 (#23599) ### Description * Pass topk_scores to beam scorer in slow topk path. * Add an env variable `ORT_BEAM_SEARCH_USE_FAST_TOPK` to enable/disable fast topk. * Add a test case for slow topk path. ### Motivation and Context This bug was introduced in https://github.com/microsoft/onnxruntime/pull/16272 Beam search uses fast cuda kernel when number of beams <= 32. When beam size is larger than that threshold, we use another code path (slower cuda kernel) to get topk. In such `slow topk path`, topk_scores shall be passed to beam scorer but it is not. This bug will cause incorrect result when num_beams > 32. It was not found previously since such large beam size is rarely used.
Author
Parents
Loading