Address Resize kernel shortcomings (#28402)
This pull request improves the robustness and correctness of the
upsampling code in ONNX Runtime, especially for anti-aliased linear and
trilinear upsampling on the CPU. The changes focus on safer handling of
large tensor dimensions, improved memory safety, and code clarity for
interpolation and weight calculation. The most important changes are
grouped below.
**Dimension and Overflow Handling:**
* Added overflow checks for multiplication of large tensor dimensions to
prevent integer overflows during output size calculations, using a new
`checked_mul_int64` lambda.
[[1]](diffhunk://#diff-13eb8371a91e6fab62e63ecc46583049f97e8acc244af6ce8cc1c981d1d72dd3R1106-R1118)
[[2]](diffhunk://#diff-13eb8371a91e6fab62e63ecc46583049f97e8acc244af6ce8cc1c981d1d72dd3R1292-R1309)
* Ensured all tensor dimensions are validated to fit within the
`int32_t` range before narrowing, improving safety for large tensors.
[[1]](diffhunk://#diff-13eb8371a91e6fab62e63ecc46583049f97e8acc244af6ce8cc1c981d1d72dd3R1133-R1143)
[[2]](diffhunk://#diff-13eb8371a91e6fab62e63ecc46583049f97e8acc244af6ce8cc1c981d1d72dd3L1131-R1234)
**Anti-Alias Upsampling Refactor:**
* Refactored the anti-alias upsampling filter setup to use a new
`InterpolationBound` struct for per-pixel coordinate ranges, replacing
the previous flat vector approach. This improves code clarity and
reduces indexing errors.
[[1]](diffhunk://#diff-051136817a71a65f4763b9f5c6e02c15f9a6aa39189d952717f4f36c6490ee38R25-R35)
[[2]](diffhunk://#diff-051136817a71a65f4763b9f5c6e02c15f9a6aa39189d952717f4f36c6490ee38L126-R159)
[[3]](diffhunk://#diff-051136817a71a65f4763b9f5c6e02c15f9a6aa39189d952717f4f36c6490ee38L155-L208)
* Updated all interpolation and extrapolation routines to use the new
`bounds` structure, improving readability and maintainability.
[[1]](diffhunk://#diff-051136817a71a65f4763b9f5c6e02c15f9a6aa39189d952717f4f36c6490ee38L272-R290)
[[2]](diffhunk://#diff-051136817a71a65f4763b9f5c6e02c15f9a6aa39189d952717f4f36c6490ee38L341-R353)
[[3]](diffhunk://#diff-051136817a71a65f4763b9f5c6e02c15f9a6aa39189d952717f4f36c6490ee38L391-R400)
* Imlpements CUDA NHWC cubic antialias support
**Memory and Type Safety:**
* Improved buffer management and type safety in filter weight
calculation, including more robust normalization and quantization for
int8/uint8 types.
* Fixed a minor logic bug in extrapolation handling by ensuring the loop
is only entered if there are out-of-bound indices.
**General Code Improvements:**
* Added missing `<limits>` include and replaced some magic numbers with
named variables for clarity.
[[1]](diffhunk://#diff-13eb8371a91e6fab62e63ecc46583049f97e8acc244af6ce8cc1c981d1d72dd3R6)
[[2]](diffhunk://#diff-13eb8371a91e6fab62e63ecc46583049f97e8acc244af6ce8cc1c981d1d72dd3L1198-R1257)
[[3]](diffhunk://#diff-13eb8371a91e6fab62e63ecc46583049f97e8acc244af6ce8cc1c981d1d72dd3L1221-R1272)
These changes together make the upsampling code more robust, especially
for large or edge-case tensors, and improve maintainability for future
development.