Fix decoder_attention scratch buffer size and prefast warnings (#14808)
(1) Change `GetScratchBuffer<T>(element_count * element_size)` to
`GetScratchBuffer<T>(element_count)` to avoid allocating more memory
than needed.
(2) Fix prefast:Warning C26451: Arithmetic overflow: Using operator '*'
on a 4 byte value and then casting the result to a 8 byte value.