CUDA Resize: add optimized 3D nearest resize kernel for 5D up/down sa… (#27578)
## Summary
This PR adds CUDA support for optimized **nearest-neighbor 3D resize
mapping/execution** in the Resize operator path, and adds targeted
regression coverage.
The implementation introduces a dedicated 3D fast path for nearest
resize to handle the last three spatial dimensions (`D/H/W`) efficiently
when outer dimensions are unchanged.
## What Changed
### CUDA Resize implementation
File: `onnxruntime/core/providers/cuda/tensor/resize_impl.cu`
- Added 3D nearest mapping kernel:
- `_ResizeNearestMappingKernel3D`
- Added 3D nearest compute kernel:
- `_ResizeNearestKernel3D`
- Added optimized 3D dispatch path in `ResizeNearestImpl`:
- Enabled when:
- `rank >= 3`
- `coordinate_transformation_mode != tf_crop_and_resize`
- all outer scales (except last 3 dims) are `1.0`
This keeps existing behavior unchanged for other cases while using the
optimized path for true 3D nearest resize workloads.
### Regression tests
File: `onnxruntime/test/providers/cpu/tensor/resize_op_test.cc`
Added CUDA-targeted regression tests:
- `ResizeOpNearestUpSampleTest_5D_CudaRegression_Optimized3DMapping`
- `ResizeOpNearestDownSampleTest_5D_CudaRegression_Optimized3DMapping`
## Why
The previous nearest implementation relied on the generic path for these
3D scenarios. This change introduces a dedicated CUDA 3D path to improve
performance for 5D nearest resize workloads.
Fixes #14596