Add Memory Efficient Attention decode support and tests for ONNX Attention #27851
Add MEA+decode test cases for ONNX Attention LLM op
719964ae
Add MEA+decode support in ONNX Attention LLM op
6ab008b7
Fix MEA eligibility: skip decode when head_size != v_head_size
f5befa80
Fix review findings: use v_head_size for V ops, add safety comment
53bddd27
Fix test review findings for MEA+decode tests
7318e243
Zero present buffers before concat to prevent NaN propagation
c2da4b12
Add asymmetric head_size regression test for MEA fallback
aadf5da1
Fix FURB110 lint: use `or` instead of ternary for v_head_size
0bdde29d
titaiwangms
changed the title Add Memory Efficient Attention decode support and tests for ONNX Add Memory Efficient Attention decode support and tests for ONNX Attention 11 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub