onnxruntime
6ab008b7 - Add MEA+decode support in ONNX Attention LLM op

Commit
29 days ago
Add MEA+decode support in ONNX Attention LLM op Enable Memory Efficient Attention (cutlass FMHA) to handle decode steps with past_key/past_value, previously restricted to Flash only. Changes: - Add LaunchConcatNewToPastKV before MEA dispatch to concatenate past_key+K into present_key (and past_value+V into present_value) following the same pattern as the Flash decode path - Remove past_key==nullptr eligibility check from mea_eligible - Track kv_is_bsnh separately from is_bsnh since present buffers are always BNSH after concat; pass kv_is_bsnh to LaunchUngroup and MEA params for correct stride computation - Set present_kv_already_populated=true after concat to skip redundant post-attention present_key/value copy - Enforce head_size==v_head_size for MEA decode (LaunchConcatNewToPastKV uses a single head_size parameter) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Agent-signed-off: Developer (16a065d8) [claude-opus-4.6] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Author
Parents
Loading