[DML EP] Attention Kernel (#13371)
### Description
DML EP kernel for com.microsoft.attention operator. It has been
implemented via DML_Graph. References for this implementation:
1. [Hugging Face Attention for
BERT](https://github.com/huggingface/transformers/blob/310340d0d01929715b30863ee6f633974d75da16/src/transformers/models/bert/modeling_bert.py#L245-L284)
2. Chapter 3 of book Orielly: Natural Language Processing with
Transformers, Revised Edition
This PR also
- includes a very tiny fix for QLinearSigmoid kernel, which is storing
the temporary object into a named variable.
- enables 4 L2 transformers LayerNorm, Gelu, MatMulScale, Attention.
### Motivation and Context
- Why is this change required? What problem does it solve?
One of the main operators used in Transformer-based model. It
contributes to the overall perf of DML EP for Transformer models.
- If it fixes an open issue, please link to the issue here. N/A
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>