Fill CUDA EP opset gap for GRU operator (14 → 22) (#27738)
### Description
Extends GRU CUDA kernel registration from opset 14 to opset 22,
following the same pattern as other recent opset gap fills (e.g.,
ConvTranspose in #27710).
- **`gru.cc`**: Cap existing opset-14 non-versioned kernel to versioned
14–21; add new non-versioned kernel at opset 22+
- **`cuda_execution_provider.cc`**: Update forward declarations and
`BuildKernelCreateInfo` entries for versioned 14–21 and non-versioned
22+
- **`deep_cpu_gru_op_test.cc`**: Add CUDA-specific test for GRU at opset
22 with `linear_before_reset=1` (cuDNN requirement)
- **`docs/OperatorKernels.md`**: Update CUDA provider GRU entry to
reflect `22+`, `[14, 21]`, and `[7, 13]` version ranges
No functional changes to the kernel implementation—the GRU spec is
unchanged between opsets 14 and 22.
### Motivation and Context
CUDA EP registered GRU only up to opset 14, while ONNX defines GRU
through opset 22. Models exported at opset ≥15 would fail to find a
matching CUDA kernel and fall back to CPU. This is one of the P1 gaps
tracked in #27729.
### Limitation
BF16 version is not added for GRU-22. It can be added later if needed.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>