onnxruntime
f773c90e - [CUDA] DecoderMaskedMultiHeadAttention files consolidation (#27688)

Commit
45 days ago
[CUDA] DecoderMaskedMultiHeadAttention files consolidation (#27688) This deletes 3 per-head-size .cu files and merges their content into a single file to avoid dependency during cuda compiling. Currently, masked_multihead_attention_kernel template is implemented in decoder_masked_multihead_attention_impl.cu‎. The other three .cu files use the masked_multihead_attention_kernel template but not include the implementation. That causes problem when they are built in cuda plugin ep.
Author
Parents
Loading