onnxruntime
2cf31a20 - Cuda: Decoder Masked Multihead Attention Q values get corrupted when using cross attention (#16721)

Commit
2 years ago
Cuda: Decoder Masked Multihead Attention Q values get corrupted when using cross attention (#16721) ### Description Some code was accidentally moved into the `if(!params.is_cross_attention)' block, it must stay outside to work in both cases. ### Motivation and Context This causes invalid results. We detected this as a performance bug, as it caused the EOS early exit to never happen, and the runs would always take max_length to complete which was slow.
Author
Parents
Loading