PR #25974 Attention CUDA BFloat16 Support

Attention CUDA BFloat16 Support #25974

nenad1002 merged 29 commits into main from nebanfic/attention-bf16-2

Start attention bf16

6bfb30ed

More changes

b6f07644

Version we can run

b70caa93

More bf16 changes

02d50384

Remove hardcoded if constexpr expressions

1480c57f

Update onnxruntime/contrib_ops/cuda/bert/utils.cuh

cc7a7284

Update onnxruntime/contrib_ops/cuda/bert/utils.cuh

7b656120

Fmt files

63f93293

Update onnxruntime/contrib_ops/cuda/bert/packed_attention_impl.cu

e0089e42

Update onnxruntime/contrib_ops/cuda/bert/packed_attention_impl.cu

849c5c77

Remove unused param

bf8c8202

Merge branch 'nebanfic/attention-bf16' of https://github.com/microsof…

8b13adad

Fix mmha_launch_kernel err

2f5b9e8d

Fix masked_multihead_attention_kernel

eb1fb854

add bfloat16 at start in docs

0b76cda3

Update docs

18fc903e

Remove head_size%4 cases

54f4583a

Remove if consexper for attention impl

718d5f56

Add back nv_bfloat16

3a03d656

Change if statements

51194dc5

use native cuda type on cudamemdevicetodevice

959a015d

introduce nv_bfloat164

bde3277a

refactoring

19a465ae

Disable memory efficient attention when using bf16

e2102586

Solve warnings

58b93e0b

nenad1002 requested a review from

tianleiwu 196 days ago

tianleiwu commented on 2025-09-10

Add check for GPU type

f34f8d18

Assert on the number of heads

1b0ed265

merge CopyQK

c322868b

assert in launchtransctx

4dd6f687

nenad1002 requested a review from

tianleiwu 194 days ago

tianleiwu approved these changes on 2025-09-15

nenad1002 merged 8301eea3 into main 191 days ago

nenad1002 deleted the nebanfic/attention-bf16-2 branch 191 days ago

Reviewers

tianleiwu

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

onnxruntime Attention CUDA BFloat16 Support #25974 Merged

Attention CUDA BFloat16 Support #25974

onnxruntime
Attention CUDA BFloat16 Support
#25974

Merged