GPT-2 attention fusion for transformers >= 4.27 (#16461)
### Description
Before transformers 4.27, the causal mask uses uint8 data type, so there
is extra Cast node to convert it to bool. This adds a pattern that
without Cast node to support attention fusion for GPT-2 models exported
with transformers >= 4.27.
### Motivation and Context
https://github.com/microsoft/onnxruntime/issues/16453