Cuda: Decoder Masked Multihead Attention Q values get corrupted when using cross attention (#16721)
### Description
Some code was accidentally moved into the
`if(!params.is_cross_attention)' block, it must stay outside to work in
both cases.
### Motivation and Context
This causes invalid results. We detected this as a performance bug, as
it caused the EOS early exit to never happen, and the runs would always
take max_length to complete which was slow.