onnxruntime
9407c327 - GPT-2 attention fusion for transformers >= 4.27 (#16461)

Commit
2 years ago
GPT-2 attention fusion for transformers >= 4.27 (#16461) ### Description Before transformers 4.27, the causal mask uses uint8 data type, so there is extra Cast node to convert it to bool. This adds a pattern that without Cast node to support attention fusion for GPT-2 models exported with transformers >= 4.27. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/16453
Author
Parents
Loading