onnxruntime
f773c90e - [CUDA] DecoderMaskedMultiHeadAttention files consolidation (#27688)

Commit

69 days ago

[CUDA] DecoderMaskedMultiHeadAttention files consolidation (#27688) This deletes 3 per-head-size .cu files and merges their content into a single file to avoid dependency during cuda compiling. Currently, masked_multihead_attention_kernel template is implemented in decoder_masked_multihead_attention_impl.cu‎. The other three .cu files use the masked_multihead_attention_kernel template but not include the implementation. That causes problem when they are built in cuda plugin ep.

References

#27688 - [CUDA] DecoderMaskedMultiHeadAttention files consolidation

Author

tianleiwu

Parents

672e3bbf

onnxruntime f773c90e - [CUDA] DecoderMaskedMultiHeadAttention files consolidation (#27688)

onnxruntime
f773c90e - [CUDA] DecoderMaskedMultiHeadAttention files consolidation (#27688)