onnxruntime
Support Attention(24)-CUDA and disjoint from contrib op
#27542

Merged

Support Attention(24)-CUDA and disjoint from contrib op #27542

titaiwangms merged 37 commits into main from titaiwang/design_attention_with_ai

titaiwangms added ep:CUDA

github-actions commented on 2026-03-04

ONNX Attention thin-dispatcher: direct flash/MEA/unfused dispatch wit…

020ac15a

titaiwangms force pushed from ec737aa8 to 020ac15a 95 days ago

Update Operator Kernel document

e270a6ad

Fix cutlass FMHA crash when attention bias stride is unaligned

6c500928

Fix GQA decode eligibility, padding mask wiring, and 4D BNSH Q transpose

6ac08bf1

Fix all-false mask crash in ConvertMaskToSeqlensKernel

f198af22

Replace host memcpy with device-side fill for CUDA graph capture

9eb11e4b

Handle bool mask in decode path to support variable padding

16e8b650

Fix NaN output for all-false bool masks in MEA path

d3b49680

Fix cutlass FMHA bias alignment crash for unaligned kv_sequence_length

e0760a85

Fix 3D mask test to use consistent per-batch padding semantics

a827a1b0

Remove 11 redundant GQA tests from test_gqa.py

9a0b546e

Fix padding mask bugs: zero present buffers, decode offset, MEA 2D ex…

939a08cf

Add TODO for GQA unfused attention fallback

118546dd

Add TODO comments for GQA+float_mask and 4D present gaps

b8ea59e3

Add TODO comments for softcap/softmax_precision and output_qk gaps

aca1cf8b

Revert "Add TODO comments for GQA+float_mask and 4D present gaps"

09900762

Code cleanup: remove dead function, fix comments, CUDA-graph-safe 2D …

cb647513

Add test improvements: unfused MHA, 4D BNSH GQA, broadcast mask, floa…

76b006a6

lint

c3f771af

titaiwangms requested a review from

copilot-pull-request-reviewer 95 days ago

copilot-pull-request-reviewer commented on 2026-03-04

titaiwangms commented on 2026-03-04

titaiwangms commented on 2026-03-05

titaiwangms marked this pull request as ready for review 94 days ago

Fix 2D mask shape, add 4D BNSH present_kv, cleanup and docs

d6f16af5

Fix 2D mask shape, add 4D BNSH present_kv, cleanup and docs

813dae28

titaiwangms commented on 2026-03-05

titaiwangms force pushed from 286352ac to 813dae28 94 days ago

titaiwangms requested a review from

copilot-pull-request-reviewer 94 days ago

copilot-pull-request-reviewer commented on 2026-03-05

Address PR review feedback: transpose helpers, assert fixes, SEGFAULT…

68a1b028

Refine comments, fix docstrings, and remove dead code

27ee9afd

Add clarifying comment for DispatchIsAligned bias alignment check

381cd83c

titaiwangms requested a review from

copilot-pull-request-reviewer 93 days ago

titaiwangms requested a review from

tianleiwu 93 days ago

copilot-pull-request-reviewer commented on 2026-03-05

titaiwangms changed the title ~~Support Attention(24)-CUDA and decouple from contrib op~~ Support Attention(24)-CUDA and disjoint from contrib op 93 days ago

Fix SM skip thresholds in attention tests (T25)

a1582510

Address Copilot review: env var support, SM skip fix, nonpad+mask fal…

73da07ac

titaiwangms requested a review from

copilot-pull-request-reviewer 93 days ago

copilot-pull-request-reviewer commented on 2026-03-06

titaiwangms commented on 2026-03-06

Wire up nonpad_kv_seqlen + attn_mask composition in unfused path (T28)

1e018940

Fix 2D mask shape in GQA tests and add mask validation (T29)

31567e76

Fix T28 review issues: guard mask dims, prevent divide-by-zero, fix s…

c5a1ebb0

Add Python tests for nonpad_kv_seqlen + attn_mask combination (T31)

7111a4d8

Address review feedback: BF16 fix, unfused nonpad+mask, test improvem…

f8cd6893

Fix test failures: reference mask shape, seqlens size, invalid configs

4b87dd8c

titaiwangms requested a review from

copilot-pull-request-reviewer 92 days ago

copilot-pull-request-reviewer commented on 2026-03-06

Validate present_key/present_value outputs in TensorScatter attention…

a6dce1a6

Address Copilot review round 3: log level, present_kv validation, cle…

a16baab8

titaiwangms requested a review from

copilot-pull-request-reviewer 92 days ago

copilot-pull-request-reviewer commented on 2026-03-06

Fix env var leak in tests: restore ORT_DISABLE_* after use

cb2ae8c5

github-actions commented on 2026-03-07

lint

579af5f4

titaiwangms requested a review from

copilot-pull-request-reviewer 92 days ago

copilot-pull-request-reviewer commented on 2026-03-07

Merge branch 'main' into titaiwang/design_attention_with_ai

46570e53

titaiwangms requested a review from

gramalingam 90 days ago

titaiwangms requested a review from

justinchuby 90 days ago

titaiwangms requested a review from

xadupre 90 days ago

gramalingam approved these changes on 2026-03-09

titaiwangms enabled auto-merge (squash) 90 days ago

titaiwangms merged 1cfda524 into main 90 days ago

titaiwangms deleted the titaiwang/design_attention_with_ai branch 90 days ago

tianleiwu commented on 2026-03-09

Reviewers

gramalingam

tianleiwu

github-actions

copilot-pull-request-reviewer

justinchuby

xadupre

Assignees

No one assigned

Labels

ep:CUDA

Milestone

No milestone

onnxruntime Support Attention(24)-CUDA and disjoint from contrib op #27542 Merged

Support Attention(24)-CUDA and disjoint from contrib op #27542

onnxruntime
Support Attention(24)-CUDA and disjoint from contrib op
#27542

Merged