onnxruntime
719964ae - Add MEA+decode test cases for ONNX Attention LLM op

Commit
8 days ago
Add MEA+decode test cases for ONNX Attention LLM op Add 5 new test classes that exercise the Memory Efficient Attention (MEA) kernel path during the decode phase (with past KV cache): 1. TestONNXAttentionMemoryEfficientGQA.test_gqa_past_memory_efficient - GQA + MEA + Decode: the critical missing test case 2. TestONNXAttentionPaddingMaskMemoryEfficientGQA.test_gqa_past_padding_mea - GQA + MEA + Decode + Bool Padding Mask 3. TestONNXAttentionGQAFloatMaskDecode.test_gqa_past_float_mask_4d - GQA + MEA + Decode + Float Mask (was a HARD ERROR before code fix) 4. TestONNXAttentionMHAPastMEA.test_mha_past_mea - MHA + MEA + Decode (explicit MEA path via ORT_DISABLE_FLASH_ATTENTION=1) 5. TestONNXAttentionMemoryEfficientGQABF16.test_gqa_past_memory_efficient_bf16 - BF16 + MEA + Decode All tests follow the existing patterns: they reuse the same parity check functions (parity_check_gqa_past, parity_check_gqa_past_with_padding, parity_check_mha_past) and test case generators (gqa_past_test_cases, gqa_past_padding_test_cases, mha_past_test_cases), forcing the MEA kernel path via @patch.dict(os.environ, {"ORT_DISABLE_FLASH_ATTENTION": "1"}). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Agent-signed-off: Developer (b0ebe545) [claude-opus-4.6] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Author
Parents
Loading