Add MEA+decode test cases for ONNX Attention LLM op
Add 5 new test classes that exercise the Memory Efficient Attention (MEA)
kernel path during the decode phase (with past KV cache):
1. TestONNXAttentionMemoryEfficientGQA.test_gqa_past_memory_efficient
- GQA + MEA + Decode: the critical missing test case
2. TestONNXAttentionPaddingMaskMemoryEfficientGQA.test_gqa_past_padding_mea
- GQA + MEA + Decode + Bool Padding Mask
3. TestONNXAttentionGQAFloatMaskDecode.test_gqa_past_float_mask_4d
- GQA + MEA + Decode + Float Mask (was a HARD ERROR before code fix)
4. TestONNXAttentionMHAPastMEA.test_mha_past_mea
- MHA + MEA + Decode (explicit MEA path via ORT_DISABLE_FLASH_ATTENTION=1)
5. TestONNXAttentionMemoryEfficientGQABF16.test_gqa_past_memory_efficient_bf16
- BF16 + MEA + Decode
All tests follow the existing patterns: they reuse the same parity check
functions (parity_check_gqa_past, parity_check_gqa_past_with_padding,
parity_check_mha_past) and test case generators (gqa_past_test_cases,
gqa_past_padding_test_cases, mha_past_test_cases), forcing the MEA kernel
path via @patch.dict(os.environ, {"ORT_DISABLE_FLASH_ATTENTION": "1"}).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent-signed-off: Developer (b0ebe545) [claude-opus-4.6]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>