[CUDA] Improve performance of DecoderMaskedMultiheadAttention on A100 #18695
Initial change
a1ce85a6
hariharans29
changed the title Improve performance of DecoderMaskedMultiheadAttention on A100 WIP: Improve performance of DecoderMaskedMultiheadAttention on A100 2 years ago
hariharans29
changed the title WIP: Improve performance of DecoderMaskedMultiheadAttention on A100 WIP: [CUDA] Improve performance of DecoderMaskedMultiheadAttention on A100 2 years ago
More tests
3661d9de
Fix test
3effe0cf
Formatting
8b8c8fe3
Fix
a7adae1e
hariharans29
changed the title WIP: [CUDA] Improve performance of DecoderMaskedMultiheadAttention on A100 [CUDA] Improve performance of DecoderMaskedMultiheadAttention on A100 2 years ago
Minor formating and PR comments
9d768634
wangyems
approved these changes
on 2023-12-06
yufenglee
merged
f68dfcd8
into main 2 years ago
yufenglee
deleted the hari/more_data_in_flight branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub