Fix CUDA ONNX Attention: min_bias_align crash on SM<80 and MEA NaN for fully-masked batches #27831
titaiwangms
force pushed
from
a04b41a4
to
38fef79f
51 days ago
titaiwangms
force pushed
from
38fef79f
to
07960375
50 days ago
titaiwangms
changed the title Fix TF32 misaligned address error in cuBLAS GEMM functions Fix misaligned BiasLoader access in CUTLASS FMHA attention dispatch 50 days ago
titaiwangms
force pushed
from
07960375
to
9a1bf882
50 days ago
titaiwangms
force pushed
from
9a1bf882
to
2cccef28
49 days ago
titaiwangms
force pushed
from
2cccef28
to
678c8408
49 days ago
titaiwangms
changed the title Fix misaligned BiasLoader access in CUTLASS FMHA attention dispatch Fix ONNX Attention CUDA: bias alignment, unfused decode concat, and MEA NaN 49 days ago
titaiwangms
force pushed
from
678c8408
to
ab43783a
49 days ago
titaiwangms
marked this pull request as ready for review 48 days ago
titaiwangms
marked this pull request as draft 48 days ago
titaiwangms
force pushed
from
ab43783a
to
38f4ca42
48 days ago
titaiwangms
changed the title Fix ONNX Attention CUDA: bias alignment, unfused decode concat, and MEA NaN Fix CUDA ONNX Attention: min_bias_align crash on SM<80 and MEA NaN for fully-masked batches 48 days ago
titaiwangms
marked this pull request as ready for review 47 days ago
titaiwangms
force pushed
from
ca2fc349
to
cedf1cd3
45 days ago
titaiwangms
force pushed
from
cedf1cd3
to
2b73e0f7
45 days ago
titaiwangms
added this to the 1.25.0 milestone 43 days ago
titaiwangms
force pushed
from
ee1429f3
to
bde34a52
43 days ago
titaiwangms
force pushed
from
bde34a52
to
f23ade69
43 days ago
titaiwangms
force pushed
from
f23ade69
to
09429ab8
43 days ago
titaiwangms
force pushed
from
09429ab8
to
40764f34
43 days ago
titaiwangms
force pushed
from
40764f34
to
86738d2f
43 days ago
titaiwangms
force pushed
from
86738d2f
to
ed246ab1
42 days ago
titaiwangms
force pushed
from
ed246ab1
to
aeb40852
42 days ago
Fix ONNX Attention: min_bias_align, UnfusedRunner decode, MEA NaN
1e59aa50
titaiwangms
force pushed
from
aeb40852
to
1e59aa50
42 days ago
tianleiwu
approved these changes
on 2026-04-04
titaiwangms
deleted the titaiwang/fix_cuda branch 41 days ago
Login to write a write a comment.
Login via GitHub