Improve SimplifiedLayerNorm by using same techniques as SkipSimplifiedLayerNorm (#25850)
### Description
Use similar shaders as SkipSimplifiedLayerNorm in SimplifiedLayerNorm,
to fix the performance issues with SimplifiedLayerNorm.
### Motivation and Context
Prior to this change, generation in Bitnet was bottlenecked on
SimplifiedLayerNorm
<img width="332" height="378" alt="image"
src="https://github.com/user-attachments/assets/3bc16ac1-ef7d-46bf-b403-92fc9192a2df"
/>
with this change performance has now improved to match
SkipSimplifiedLayerNorm
<img width="699" height="179" alt="image"
src="https://github.com/user-attachments/assets/30009d85-d5d9-4585-987a-b39ecf52e0b5"
/>