onnxruntime
Attention CUDA BFloat16 Support
#25974
Merged

Attention CUDA BFloat16 Support #25974

nenad1002 merged 29 commits into main from nebanfic/attention-bf16-2
nenad1002
nenad1002 Start attention bf16
6bfb30ed
nenad1002 More changes
b6f07644
nenad1002 Version we can run
b70caa93
nenad1002 More bf16 changes
02d50384
nenad1002 Remove hardcoded if constexpr expressions
1480c57f
nenad1002 Update onnxruntime/contrib_ops/cuda/bert/utils.cuh
cc7a7284
nenad1002 Update onnxruntime/contrib_ops/cuda/bert/utils.cuh
7b656120
nenad1002 Fmt files
63f93293
nenad1002 Update onnxruntime/contrib_ops/cuda/bert/packed_attention_impl.cu
e0089e42
nenad1002 Update onnxruntime/contrib_ops/cuda/bert/packed_attention_impl.cu
849c5c77
nenad1002 Remove unused param
bf8c8202
nenad1002 Merge branch 'nebanfic/attention-bf16' of https://github.com/microsof…
8b13adad
nenad1002 Fix mmha_launch_kernel err
2f5b9e8d
nenad1002 Fix masked_multihead_attention_kernel
eb1fb854
nenad1002 add bfloat16 at start in docs
0b76cda3
nenad1002 Update docs
18fc903e
nenad1002 Remove head_size%4 cases
54f4583a
nenad1002 Remove if consexper for attention impl
718d5f56
nenad1002 Add back nv_bfloat16
3a03d656
nenad1002 Change if statements
51194dc5
nenad1002 use native cuda type on cudamemdevicetodevice
959a015d
nenad1002 introduce nv_bfloat164
bde3277a
nenad1002 refactoring
19a465ae
nenad1002 Disable memory efficient attention when using bf16
e2102586
nenad1002 Solve warnings
58b93e0b
nenad1002 nenad1002 requested a review from tianleiwu tianleiwu 196 days ago
tianleiwu
tianleiwu commented on 2025-09-10
tianleiwu
tianleiwu commented on 2025-09-10
tianleiwu
tianleiwu commented on 2025-09-10
tianleiwu
tianleiwu commented on 2025-09-10
tianleiwu
tianleiwu commented on 2025-09-10
tianleiwu
tianleiwu commented on 2025-09-10
nenad1002 Add check for GPU type
f34f8d18
nenad1002 Assert on the number of heads
1b0ed265
nenad1002 merge CopyQK
c322868b
nenad1002 assert in launchtransctx
4dd6f687
nenad1002 nenad1002 requested a review from tianleiwu tianleiwu 194 days ago
tianleiwu
tianleiwu approved these changes on 2025-09-15
nenad1002 nenad1002 merged 8301eea3 into main 191 days ago
nenad1002 nenad1002 deleted the nebanfic/attention-bf16-2 branch 191 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone