onnxruntime
Add Memory Efficient Attention decode support and tests for ONNX Attention
#27851
Open

Add Memory Efficient Attention decode support and tests for ONNX Attention #27851

titaiwangms wants to merge 8 commits into main from feature/mea-decode-support
titaiwangms
titaiwangms Add MEA+decode test cases for ONNX Attention LLM op
719964ae
titaiwangms Add MEA+decode support in ONNX Attention LLM op
6ab008b7
titaiwangms Fix MEA eligibility: skip decode when head_size != v_head_size
f5befa80
titaiwangms Fix review findings: use v_head_size for V ops, add safety comment
53bddd27
titaiwangms Fix test review findings for MEA+decode tests
7318e243
titaiwangms Zero present buffers before concat to prevent NaN propagation
c2da4b12
titaiwangms Add asymmetric head_size regression test for MEA fallback
aadf5da1
titaiwangms Fix FURB110 lint: use `or` instead of ternary for v_head_size
0bdde29d
titaiwangms titaiwangms changed the title Add Memory Efficient Attention decode support and tests for ONNX Add Memory Efficient Attention decode support and tests for ONNX Attention 11 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone