Fill GlobalAveragePool and GlobalMaxPool opset gap in CUDA provider (1→22) (#27733)
### Description
Extends CUDA kernel registrations for `GlobalAveragePool` and
`GlobalMaxPool` from opset 1 only to the full opset 1–22 range. Follows
the same pattern used for `MaxPool` in #27715.
- **`core/providers/cuda/nn/pool.cc`** — Split single opset-1
registrations into versioned 1–21 + opset 22 for both NCHW and NHWC
variants
- **`core/providers/cuda/cuda_execution_provider.cc`** — Updated class
declarations and `BuildKernelCreateInfo` entries (versioned 1–21, added
opset 22)
- **`core/providers/cuda/cuda_nhwc_kernels.cc`** — Same for NHWC kernel
registrations
- **`test/providers/cpu/nn/pool_op_test.cc`** — Added
`GlobalAveragePool_22_CUDA` test
- **`docs/OperatorKernels.md`** — Updated GlobalAveragePool and
GlobalMaxPool entries from `1+` to `22+` / `[1, 21]` in both the ai.onnx
and com.microsoft.internal.nhwc domains under CUDAExecutionProvider
No functional changes to the kernel implementations—opsets 1 through 22
are spec-compatible for these ops.
### Motivation and Context
`GlobalAveragePool` and `GlobalMaxPool` were registered at opset 1 only
in the CUDA provider, creating a 21-version gap to the latest ONNX opset
22. Models exported at higher opsets would fail to find a matching CUDA
kernel. Identified as P1 gaps in #27729.
### Limitations
BF16 support for GlobalAveragePool-22 and GlobalMaxPool-22 is not added
in this PR.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>