enable reduce ops on opset18 (#18053)
### Description
Opset 18 apply the "axes as input" change from ReduceSum to all the
other reduce ops. Our cuda kernel actually support it, but we didn't
enable it for opset18. This PR update the reduce ops' kernel
registration to enable the "axes as input" behavior for opset18.
As part of the fix, I also simplify the reduce op kernel registration
part. ORT doesn't require the kernel definition need to be exactly the
same as onnx op definition. For our case, which we share the same kernel
for all the reduce ops (from version 1 to version 18), we don't need to
maintain different version of kernel definitions. we can simplify it by
just using a single kernel definition for multiple versions. Although
for some cases, we might register more types for legacy versions, but it
is harmless. Framework is using schema to validate the graph, not kernel
definition.
---------
Co-authored-by: Cheng Tang <chenta@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
Co-authored-by: Cheng Tang <chenta@microsoft.com>