Generalize noop_with_empty_axes handling across all Reduce operators (#26436)
### Description
<!-- Describe your changes. -->
This PR fixes the behavior of the reduction operators so it's aligned
with the ONNX specification.
[See ONNX ReduceSumSquare
Specification](https://onnx.ai/onnx/operators/onnx__ReduceSumSquare.html)
for the definition of noop_with_empty_axes and expected behavior when
axes=[] or when the axes input is not provided.
Main changes:
- Added a new helper function ApplyNoopEmptyAxesElementwise<AGG>() to
handle the case axes=[] and noop_with_empty_axes=1 for all reduction
operators (ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum,
ReduceLogSumExp, etc.).
This function performs elementwise operations according to the
aggregator type:
If the aggregator defines Pre/Post operations (e.g., abs, square, sqrt,
log), they are applied elementwise on each element of the input without
reduction.
Otherwise, a direct memory copy (memcpy) is performed to efficiently
produce an identical output.
- Introduced a compile-time trait system ReduceAggTraits to detect
whether each aggregator defines a PreOp and/or PostOp.
This allows compile-time specialization and avoids redundant runtime
checks.
- Updated the generic reduction paths (CommonReduce1Loop and
CommonReduce2Loops)
to invoke ApplyNoopEmptyAxesElementwise<AGG>() when axes=[] and
noop_with_empty_axes=1.
These paths are used by all reduction operators, ensuring consistent
handling across the entire operators.
- Fixed the conditional inside CommonFastReduceCopy: replaced the
previous unconditional memcpy and return true with a safe return false,
since empty-axes cases are now fully handled upstream by
ApplyNoopEmptyAxesElementwise<AGG>().
This keeps the control flow explicit and prevents unintended fallback
copies.
- Added unit tests for all affected reduction operators
(ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp),and
corrected one test that previously expected identity output but now
correctly applies Pre/Post operations per the ONNX specification.
These updates ensure spec-compliant behavior for all Reduce ops and
slightly improve performance for identity cases.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
-->Before this fix, some Reduce operators (e.g., ReduceSumSquare,
ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp) did not follow the
ONNX spec when axes=[] and noop_with_empty_axes=1.
Per the ONNX specification:
No reduction should occur if axes is empty and noop_with_empty_axes=1.
Operators with Pre/Post (e.g., abs, square, sqrt, log) must apply them
elementwise.
Others should return the input unchanged.
Fixes #26288
Fixes #25095
**Note:** This fix currently applies to the CPU Execution Provider only.
Other EPs will need a similar update to align with this behavior.
<!-- -If it fixes an open issue, please link to the issue here. -->