Add full broadcasting support to LayerNormalization and RMSNormalization (#26613)
### Description
<!-- Describe your changes. -->
This PR adds full and spec-compliant broadcasting support to both
LayerNormalization and RMSNormalization.
Previously, onnxruntime supported only a partial set of broadcasting
cases (based on the logic introduced in this PR #23297 ).
That implementation handled several cases but did not cover all valid
broadcasting scenarios.
This PR introduces a complete generic broadcasting path, following the
[ONNX specification
rules](https://github.com/onnx/onnx/blob/main/docs/Broadcasting.md).
The previous implementation is preserved as a fast-path and is still
used whenever the Scale/Bias shapes match directly.
Main changes:
- Extended broadcasting logic in:
layer_norm_helper.h
layer_norm_impl.cc
- Added full support for all valid broadcasting configurations of Scale
and Bias.
- Preserved previous partial logic as a fast-path for exact-match cases.
- Added comprehensive tests to:
layer_norm_op_test.cc
rms_norm_op_test.cc
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?-->
Before this fix, some valid ONNX broadcasting shapes were rejected in
LayerNormalization and RMSNormalization.
This PR brings the operators into full alignment with the ONNX
specification and fixes models that previously failed due to incomplete
broadcasting support.
Fixes #26432
Fixes #18184
<!-- -If it fixes an open issue, please link to the issue here. -->