Add --use_multi_head_attention in transformers fusion (#14198)
Add an option --use_multi_head_attention to fuse model with
MultiHeadAttention operator instead of Attention operator for testing
purpose.
Note that MultiHeadAttention can be used in self-attention and
cross-attention, while Attention operator is used for self-attention
only. In Attention operator, there is packed Q/K/V weights for input
projection, but that MatMul of input projection is excluded from
MultiHeadAttention.