Fill Squeeze and Unsqueeze CUDA opset gaps to opset 25 (#27739)
### Description
Extends CUDA EP Squeeze and Unsqueeze kernel registrations from opset 23
to opset 25, matching CPU provider coverage.
- **`squeeze.cc` / `unsqueeze.cc`**: Cap opset 23 to versioned `23–23`,
add versioned `24–24`, add non-versioned `25`
- **`cuda_execution_provider.cc`**: Add corresponding forward
declarations and `BuildKernelCreateInfo` registry entries for opsets 23
(now versioned), 24, and 25
- **`docs/OperatorKernels.md`**: Update CUDA Squeeze and Unsqueeze
entries to reflect `25+` coverage with individual `24` and `23` version
rows
No new computation logic — these ops are shape-only (data is a
`cudaMemcpyAsync`), so the same kernel implementation covers all new
opsets.
### Motivation and Context
CUDA EP registered Squeeze/Unsqueeze only up to opset 23 while the ONNX
spec defines them through opset 25. Models exported at opset 24+ would
fail to find a matching CUDA kernel. Part of the broader opset gap audit
tracked in #27729.
### Limitation
It does not include new data types for float8, float4, int4 etc. That
will be added later if needed.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>