Add Bert/GPT2 fusion change for new attribute mask_filter_value in ORT optimizer (#14333)
### Description
<!-- Describe your changes. -->
The changes correspond to specify the mask_filter_value in attention
attribute. However, the ORT optimizer cannot fuse
SkipLayerNorm/Attention/EmbedLayerNorm with the most recent
transformers. So this PR may only address this issue with some older
version of onnx models(e.g the one used in the unittest)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>