Fix SkipLayerNorm fusion in transformer optimizer (#17320)
### Description
Fix issues:
(1) When the output of Add before LayerNormalization node is a graph
output, we shall output it in SkipLayerNormalization, but currently not.
(2) When there is Cast before Add bias, the Cast output (instead of
input) shall be used as SkipLayerNormalization input.
(3) The skip input is not at the second input of fused node. According
to op spec, skip shall be the second. It could bring issue when we add
skip broadcasting support later.
### Motivation and Context
Fusion for Clip model of SDXL failed since the last hidden state is a
graph output.