Support Attention(24)-CUDA and disjoint from contrib op #27542
ONNX Attention thin-dispatcher: direct flash/MEA/unfused dispatch wit…
020ac15a
titaiwangms
force pushed
from
ec737aa8
to
020ac15a
95 days ago
Update Operator Kernel document
e270a6ad
Fix cutlass FMHA crash when attention bias stride is unaligned
6c500928
Fix GQA decode eligibility, padding mask wiring, and 4D BNSH Q transpose
6ac08bf1
Fix all-false mask crash in ConvertMaskToSeqlensKernel
f198af22
Replace host memcpy with device-side fill for CUDA graph capture
9eb11e4b
Handle bool mask in decode path to support variable padding
16e8b650
Fix NaN output for all-false bool masks in MEA path
d3b49680
Fix cutlass FMHA bias alignment crash for unaligned kv_sequence_length
e0760a85
Fix 3D mask test to use consistent per-batch padding semantics
a827a1b0
Remove 11 redundant GQA tests from test_gqa.py
9a0b546e
Fix padding mask bugs: zero present buffers, decode offset, MEA 2D ex…
939a08cf
Add TODO for GQA unfused attention fallback
118546dd
Add TODO comments for GQA+float_mask and 4D present gaps
b8ea59e3
Add TODO comments for softcap/softmax_precision and output_qk gaps
aca1cf8b
Revert "Add TODO comments for GQA+float_mask and 4D present gaps"
09900762
Code cleanup: remove dead function, fix comments, CUDA-graph-safe 2D …
cb647513
Add test improvements: unfused MHA, 4D BNSH GQA, broadcast mask, floa…
76b006a6
lint
c3f771af
titaiwangms
marked this pull request as ready for review 94 days ago
Fix 2D mask shape, add 4D BNSH present_kv, cleanup and docs
d6f16af5
Fix 2D mask shape, add 4D BNSH present_kv, cleanup and docs
813dae28
titaiwangms
force pushed
from
286352ac
to
813dae28
94 days ago
Address PR review feedback: transpose helpers, assert fixes, SEGFAULT…
68a1b028
Refine comments, fix docstrings, and remove dead code
27ee9afd
Add clarifying comment for DispatchIsAligned bias alignment check
381cd83c
titaiwangms
changed the title Support Attention(24)-CUDA and decouple from contrib op Support Attention(24)-CUDA and disjoint from contrib op 93 days ago
Fix SM skip thresholds in attention tests (T25)
a1582510
Address Copilot review: env var support, SM skip fix, nonpad+mask fal…
73da07ac
Wire up nonpad_kv_seqlen + attn_mask composition in unfused path (T28)
1e018940
Fix 2D mask shape in GQA tests and add mask validation (T29)
31567e76
Fix T28 review issues: guard mask dims, prevent divide-by-zero, fix s…
c5a1ebb0
Add Python tests for nonpad_kv_seqlen + attn_mask combination (T31)
7111a4d8
Address review feedback: BF16 fix, unfused nonpad+mask, test improvem…
f8cd6893
Fix test failures: reference mask shape, seqlens size, invalid configs
4b87dd8c
Validate present_key/present_value outputs in TensorScatter attention…
a6dce1a6
Address Copilot review round 3: log level, present_kv validation, cle…
a16baab8
Fix env var leak in tests: restore ORT_DISABLE_* after use
cb2ae8c5
lint
579af5f4
Merge branch 'main' into titaiwang/design_attention_with_ai
46570e53
titaiwangms
deleted the titaiwang/design_attention_with_ai branch 90 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub