[DML EP] Attention Kernel (#13371)

Commit

3 years ago

[DML EP] Attention Kernel (#13371) ### Description DML EP kernel for com.microsoft.attention operator. It has been implemented via DML_Graph. References for this implementation: 1. [Hugging Face Attention for BERT](https://github.com/huggingface/transformers/blob/310340d0d01929715b30863ee6f633974d75da16/src/transformers/models/bert/modeling_bert.py#L245-L284) 2. Chapter 3 of book Orielly: Natural Language Processing with Transformers, Revised Edition This PR also - includes a very tiny fix for QLinearSigmoid kernel, which is storing the temporary object into a named variable. - enables 4 L2 transformers LayerNorm, Gelu, MatMulScale, Attention. ### Motivation and Context - Why is this change required? What problem does it solve? One of the main operators used in Transformer-based model. It contributes to the overall perf of DML EP for Transformer models. - If it fixes an open issue, please link to the issue here. N/A Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

References

#13371 - [DML EP] Attention Kernel

Author

sumitsays

Parents

18854607

onnxruntime 24818cfd - [DML EP] Attention Kernel (#13371)

onnxruntime
24818cfd - [DML EP] Attention Kernel (#13371)