onnxruntime
293a5ac5 - Make DMMHA kernel inside MHA optional for Whisper (#25166)

Commit
178 days ago
Make DMMHA kernel inside MHA optional for Whisper (#25166) ### Description This PR sets adding support for the `DecoderMaskedMultiHeadAttention` (DMMHA) kernel inside `MultiHeadAttention` (MHA) to false by default. ### Motivation and Context The models containing the extra inputs for DMMHA (i.e. `past_sequence_length` and `cache_indirection`) have some runtime issues. Additionally, not all execution providers implement the DMMHA kernel inside MHA and will therefore not support these extra inputs.
Parents
Loading