Add Attention Fusion Transformer (#2445)

Commit

6 years ago

Add Attention Fusion Transformer (#2445) Add Attention Fusion Transformer to fuse multi-head self attention subgraph to one node for optimizing Bert model inference performance. It supports BERT model exported from PyTorch. It fuses about 20 nodes into one Attention node, and could significantly improve the inference speed of BERT model. Support symbolic (first dimension for batch size) in input shape.

References

#2445 - Add Attention Fusion Transformer

Author

tianleiwu

Parents

4caf5c9c

onnxruntime 5dce9be4 - Add Attention Fusion Transformer (#2445)

onnxruntime
5dce9be4 - Add Attention Fusion Transformer (#2445)