onnxruntime
[CUDA] Improve performance of DecoderMaskedMultiheadAttention on A100
#18695
Merged

[CUDA] Improve performance of DecoderMaskedMultiheadAttention on A100 #18695

yufenglee merged 6 commits into main from hari/more_data_in_flight
hariharans29
hariharans29 Initial change
a1ce85a6
hariharans29 hariharans29 requested a review from tianleiwu tianleiwu 2 years ago
hariharans29 hariharans29 requested a review from yufenglee yufenglee 2 years ago
hariharans29 hariharans29 requested a review from wangyems wangyems 2 years ago
hariharans29 hariharans29 changed the title Improve performance of DecoderMaskedMultiheadAttention on A100 WIP: Improve performance of DecoderMaskedMultiheadAttention on A100 2 years ago
hariharans29
hariharans29 commented on 2023-12-04
hariharans29
hariharans29 commented on 2023-12-04
hariharans29
hariharans29 commented on 2023-12-04
hariharans29
hariharans29 commented on 2023-12-04
hariharans29 hariharans29 requested a review from zhanghuanrong zhanghuanrong 2 years ago
hariharans29 hariharans29 changed the title WIP: Improve performance of DecoderMaskedMultiheadAttention on A100 WIP: [CUDA] Improve performance of DecoderMaskedMultiheadAttention on A100 2 years ago
hariharans29 More tests
3661d9de
hariharans29 Fix test
3effe0cf
hariharans29 Formatting
8b8c8fe3
hariharans29 Fix
a7adae1e
hariharans29 hariharans29 changed the title WIP: [CUDA] Improve performance of DecoderMaskedMultiheadAttention on A100 [CUDA] Improve performance of DecoderMaskedMultiheadAttention on A100 2 years ago
wangyems
hariharans29
hariharans29 Minor formating and PR comments
9d768634
wangyems
wangyems approved these changes on 2023-12-06
yufenglee yufenglee merged f68dfcd8 into main 2 years ago
yufenglee yufenglee deleted the hari/more_data_in_flight branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone