Fix test review findings for MEA+decode tests
1. Rename TestONNXAttentionGQAFloatMaskDecode to
TestONNXAttentionMemoryEfficientGQAFloatMaskDecode for searchability
(all MEA test classes now contain 'MemoryEfficient' or 'MEA')
2. Add present_k/v verification to float mask decode test — now checks
that concatenated KV buffers match reference, not just output
3. Add comment explaining std=0.2 scaling (keeps fp16 numerically stable)
4. Add TestONNXAttentionGQA4DBNSHMEA — exercises 4D BNSH transpose logic
through the MEA decode path (use_4d_bnsh=True)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent-signed-off: Developer (b0ebe545) [claude-opus-4.6]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>