Attention CUDA BFloat16 Support #25974
Start attention bf16
6bfb30ed
More changes
b6f07644
Version we can run
b70caa93
More bf16 changes
02d50384
Remove hardcoded if constexpr expressions
1480c57f
Update onnxruntime/contrib_ops/cuda/bert/utils.cuh
cc7a7284
Update onnxruntime/contrib_ops/cuda/bert/utils.cuh
7b656120
Fmt files
63f93293
Update onnxruntime/contrib_ops/cuda/bert/packed_attention_impl.cu
e0089e42
Update onnxruntime/contrib_ops/cuda/bert/packed_attention_impl.cu
849c5c77
Remove unused param
bf8c8202
Merge branch 'nebanfic/attention-bf16' of https://github.com/microsof…
8b13adad
Fix mmha_launch_kernel err
2f5b9e8d
Fix masked_multihead_attention_kernel
eb1fb854
add bfloat16 at start in docs
0b76cda3
Update docs
18fc903e
Remove head_size%4 cases
54f4583a
Remove if consexper for attention impl
718d5f56
Add back nv_bfloat16
3a03d656
Change if statements
51194dc5
use native cuda type on cudamemdevicetodevice
959a015d
introduce nv_bfloat164
bde3277a
refactoring
19a465ae
Disable memory efficient attention when using bf16
e2102586
Solve warnings
58b93e0b
Add check for GPU type
f34f8d18
Assert on the number of heads
1b0ed265
merge CopyQK
c322868b
assert in launchtransctx
4dd6f687
tianleiwu
approved these changes
on 2025-09-15
nenad1002
merged
8301eea3
into main 191 days ago
nenad1002
deleted the nebanfic/attention-bf16-2 branch 191 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub