onnxruntime
293a5ac5 - Make DMMHA kernel inside MHA optional for Whisper (#25166)

Commit

178 days ago

Make DMMHA kernel inside MHA optional for Whisper (#25166) ### Description This PR sets adding support for the `DecoderMaskedMultiHeadAttention` (DMMHA) kernel inside `MultiHeadAttention` (MHA) to false by default. ### Motivation and Context The models containing the extra inputs for DMMHA (i.e. `past_sequence_length` and `cache_indirection`) have some runtime issues. Additionally, not all execution providers implement the DMMHA kernel inside MHA and will therefore not support these extra inputs.

References

#25166 - Make DMMHA kernel inside MHA optional for Whisper

Author

kunal-vaishnavi

Parents

7d22c09d

onnxruntime 293a5ac5 - Make DMMHA kernel inside MHA optional for Whisper (#25166)

onnxruntime
293a5ac5 - Make DMMHA kernel inside MHA optional for Whisper (#25166)