Add Attention Fusion Transformer (#2445)
Add Attention Fusion Transformer to fuse multi-head self attention subgraph to one node for optimizing Bert model inference performance.
It supports BERT model exported from PyTorch. It fuses about 20 nodes into one Attention node, and could significantly improve the inference speed of BERT model.
Support symbolic (first dimension for batch size) in input shape.