onnxruntime
Fix CUDA ONNX Attention: min_bias_align crash on SM<80 and MEA NaN for fully-masked batches
#27831
Merged

Fix CUDA ONNX Attention: min_bias_align crash on SM<80 and MEA NaN for fully-masked batches #27831

titaiwangms merged 1 commit into main from titaiwang/fix_cuda
titaiwangms
yuslepukhin yuslepukhin requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 51 days ago
yuslepukhin yuslepukhin removed review request from copilot-pull-request-reviewer copilot-pull-request-reviewer 51 days ago
yuslepukhin yuslepukhin requested a review from tianleiwu tianleiwu 51 days ago
yuslepukhin
yuslepukhin commented on 2026-03-24
github-actions
github-actions commented on 2026-03-24
titaiwangms titaiwangms force pushed from a04b41a4 to 38fef79f 51 days ago
titaiwangms titaiwangms force pushed from 38fef79f to 07960375 50 days ago
titaiwangms titaiwangms changed the title Fix TF32 misaligned address error in cuBLAS GEMM functions Fix misaligned BiasLoader access in CUTLASS FMHA attention dispatch 50 days ago
titaiwangms titaiwangms assigned tianleiwu tianleiwu 50 days ago
titaiwangms
titaiwangms titaiwangms force pushed from 07960375 to 9a1bf882 50 days ago
titaiwangms titaiwangms force pushed from 9a1bf882 to 2cccef28 49 days ago
github-actions
github-actions commented on 2026-03-26
titaiwangms
titaiwangms commented on 2026-03-27
titaiwangms titaiwangms unassigned tianleiwu tianleiwu 49 days ago
titaiwangms titaiwangms force pushed from 2cccef28 to 678c8408 49 days ago
titaiwangms titaiwangms changed the title Fix misaligned BiasLoader access in CUTLASS FMHA attention dispatch Fix ONNX Attention CUDA: bias alignment, unfused decode concat, and MEA NaN 49 days ago
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 49 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-27
titaiwangms titaiwangms force pushed from 678c8408 to ab43783a 49 days ago
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 49 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-27
titaiwangms titaiwangms marked this pull request as ready for review 48 days ago
titaiwangms titaiwangms closed this 48 days ago
titaiwangms titaiwangms reopened this 48 days ago
titaiwangms
titaiwangms titaiwangms assigned tianleiwu tianleiwu 48 days ago
yuslepukhin yuslepukhin requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 48 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-27
titaiwangms
titaiwangms titaiwangms marked this pull request as draft 48 days ago
titaiwangms titaiwangms force pushed from ab43783a to 38f4ca42 48 days ago
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 48 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-28
titaiwangms titaiwangms changed the title Fix ONNX Attention CUDA: bias alignment, unfused decode concat, and MEA NaN Fix CUDA ONNX Attention: min_bias_align crash on SM<80 and MEA NaN for fully-masked batches 48 days ago
titaiwangms titaiwangms marked this pull request as ready for review 47 days ago
titaiwangms
yuslepukhin yuslepukhin requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 45 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-30
titaiwangms titaiwangms force pushed from ca2fc349 to cedf1cd3 45 days ago
titaiwangms titaiwangms force pushed from cedf1cd3 to 2b73e0f7 45 days ago
titaiwangms
titaiwangms titaiwangms added this to the 1.25.0 milestone 43 days ago
tianleiwu
tianleiwu commented on 2026-04-01
titaiwangms titaiwangms force pushed from ee1429f3 to bde34a52 43 days ago
titaiwangms titaiwangms force pushed from bde34a52 to f23ade69 43 days ago
titaiwangms titaiwangms force pushed from f23ade69 to 09429ab8 43 days ago
titaiwangms titaiwangms force pushed from 09429ab8 to 40764f34 43 days ago
titaiwangms
titaiwangms titaiwangms force pushed from 40764f34 to 86738d2f 43 days ago
tianleiwu
tianleiwu commented on 2026-04-01
titaiwangms titaiwangms force pushed from 86738d2f to ed246ab1 42 days ago
titaiwangms titaiwangms force pushed from ed246ab1 to aeb40852 42 days ago
tianleiwu
tianleiwu commented on 2026-04-02
tianleiwu
tianleiwu commented on 2026-04-02
titaiwangms Fix ONNX Attention: min_bias_align, UnfusedRunner decode, MEA NaN
1e59aa50
titaiwangms titaiwangms force pushed from aeb40852 to 1e59aa50 42 days ago
titaiwangms titaiwangms requested a review from tianleiwu tianleiwu 41 days ago
tianleiwu
tianleiwu approved these changes on 2026-04-04
titaiwangms titaiwangms merged eb706ed3 into main 41 days ago
titaiwangms titaiwangms deleted the titaiwang/fix_cuda branch 41 days ago

Login to write a write a comment.

Login via GitHub

Assignees
Labels
Milestone