onnxruntime
Support Attention(24)-CUDA and disjoint from contrib op
#27542
Merged

Support Attention(24)-CUDA and disjoint from contrib op #27542

titaiwangms
titaiwangms titaiwangms added ep:CUDA
github-actions
github-actions commented on 2026-03-04
titaiwangms ONNX Attention thin-dispatcher: direct flash/MEA/unfused dispatch wit…
020ac15a
titaiwangms titaiwangms force pushed from ec737aa8 to 020ac15a 95 days ago
titaiwangms Update Operator Kernel document
e270a6ad
titaiwangms Fix cutlass FMHA crash when attention bias stride is unaligned
6c500928
titaiwangms Fix GQA decode eligibility, padding mask wiring, and 4D BNSH Q transpose
6ac08bf1
titaiwangms Fix all-false mask crash in ConvertMaskToSeqlensKernel
f198af22
titaiwangms Replace host memcpy with device-side fill for CUDA graph capture
9eb11e4b
titaiwangms Handle bool mask in decode path to support variable padding
16e8b650
titaiwangms Fix NaN output for all-false bool masks in MEA path
d3b49680
titaiwangms Fix cutlass FMHA bias alignment crash for unaligned kv_sequence_length
e0760a85
titaiwangms Fix 3D mask test to use consistent per-batch padding semantics
a827a1b0
titaiwangms Remove 11 redundant GQA tests from test_gqa.py
9a0b546e
titaiwangms Fix padding mask bugs: zero present buffers, decode offset, MEA 2D ex…
939a08cf
titaiwangms Add TODO for GQA unfused attention fallback
118546dd
titaiwangms Add TODO comments for GQA+float_mask and 4D present gaps
b8ea59e3
titaiwangms Add TODO comments for softcap/softmax_precision and output_qk gaps
aca1cf8b
titaiwangms Revert "Add TODO comments for GQA+float_mask and 4D present gaps"
09900762
titaiwangms Code cleanup: remove dead function, fix comments, CUDA-graph-safe 2D …
cb647513
titaiwangms Add test improvements: unfused MHA, 4D BNSH GQA, broadcast mask, floa…
76b006a6
titaiwangms lint
c3f771af
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 95 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-04
titaiwangms
titaiwangms commented on 2026-03-04
titaiwangms
titaiwangms commented on 2026-03-05
titaiwangms titaiwangms marked this pull request as ready for review 94 days ago
titaiwangms Fix 2D mask shape, add 4D BNSH present_kv, cleanup and docs
d6f16af5
titaiwangms Fix 2D mask shape, add 4D BNSH present_kv, cleanup and docs
813dae28
titaiwangms
titaiwangms commented on 2026-03-05
titaiwangms
titaiwangms commented on 2026-03-05
titaiwangms titaiwangms force pushed from 286352ac to 813dae28 94 days ago
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 94 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-05
tianleiwu
titaiwangms Address PR review feedback: transpose helpers, assert fixes, SEGFAULT…
68a1b028
titaiwangms Refine comments, fix docstrings, and remove dead code
27ee9afd
titaiwangms Add clarifying comment for DispatchIsAligned bias alignment check
381cd83c
titaiwangms
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 93 days ago
titaiwangms titaiwangms requested a review from tianleiwu tianleiwu 93 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-05
titaiwangms titaiwangms changed the title Support Attention(24)-CUDA and decouple from contrib op Support Attention(24)-CUDA and disjoint from contrib op 93 days ago
titaiwangms Fix SM skip thresholds in attention tests (T25)
a1582510
titaiwangms Address Copilot review: env var support, SM skip fix, nonpad+mask fal…
73da07ac
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 93 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-06
titaiwangms
titaiwangms commented on 2026-03-06
titaiwangms
titaiwangms commented on 2026-03-06
titaiwangms
titaiwangms commented on 2026-03-06
titaiwangms
titaiwangms commented on 2026-03-06
titaiwangms Wire up nonpad_kv_seqlen + attn_mask composition in unfused path (T28)
1e018940
titaiwangms Fix 2D mask shape in GQA tests and add mask validation (T29)
31567e76
titaiwangms Fix T28 review issues: guard mask dims, prevent divide-by-zero, fix s…
c5a1ebb0
titaiwangms Add Python tests for nonpad_kv_seqlen + attn_mask combination (T31)
7111a4d8
titaiwangms Address review feedback: BF16 fix, unfused nonpad+mask, test improvem…
f8cd6893
titaiwangms Fix test failures: reference mask shape, seqlens size, invalid configs
4b87dd8c
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 92 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-06
titaiwangms Validate present_key/present_value outputs in TensorScatter attention…
a6dce1a6
titaiwangms Address Copilot review round 3: log level, present_kv validation, cle…
a16baab8
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 92 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-06
titaiwangms Fix env var leak in tests: restore ORT_DISABLE_* after use
cb2ae8c5
github-actions
github-actions commented on 2026-03-07
titaiwangms lint
579af5f4
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 92 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-07
titaiwangms Merge branch 'main' into titaiwang/design_attention_with_ai
46570e53
titaiwangms titaiwangms requested a review from gramalingam gramalingam 90 days ago
titaiwangms titaiwangms requested a review from justinchuby justinchuby 90 days ago
titaiwangms titaiwangms requested a review from xadupre xadupre 90 days ago
gramalingam
gramalingam approved these changes on 2026-03-09
titaiwangms titaiwangms enabled auto-merge (squash) 90 days ago
titaiwangms titaiwangms merged 1cfda524 into main 90 days ago
titaiwangms titaiwangms deleted the titaiwang/design_attention_with_ai branch 90 days ago
tianleiwu
tianleiwu commented on 2026-03-09

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone