onnxruntime
fde4e032 - [CUDA] Extend Pad support through opset 25 with wrap mode (#27774)

Commit

37 days ago

[CUDA] Extend Pad support through opset 25 with wrap mode (#27774) ### Description This PR consolidates PRs #27416 and #27708 to extend CUDA Pad kernel support through opset 25, including wrap mode implementation. ### Motivation and Context The CUDA execution provider previously only registered the Pad kernel up to opset 18 and did not implement wrap mode. When an ONNX model exported with opset 19+ was run on the CUDA executor, the Pad operation was forced to fall back to CPU, resulting in significant performance degradation. This PR aligns CUDA Pad registration with the ONNX Pad schema evolution through opset 25 and provides a correct wrap mode implementation. Related issues: https://github.com/microsoft/onnxruntime/issues/26393 Related PRs: #27416, #27708 ### Summary of Changes #### Kernel registration and opset coverage | File | Change | |------|--------| | `onnxruntime/core/providers/cuda/tensor/pad.cc` | Adds CUDA Pad kernel registrations for opset ranges 18, 19-20, 21-22, 23, 24, and 25. | | `onnxruntime/core/providers/cuda/cuda_execution_provider.cc` | Registers the new Pad kernel versions in the CUDA EP registry under the existing per-opset sections. | #### CUDA Pad implementation | File | Change | |------|--------| | `onnxruntime/core/providers/cuda/tensor/pad_impl.h` | Extends the Pad kernel interface to pass effective sliced extents and per-axis input offsets. | | `onnxruntime/core/providers/cuda/tensor/pad_impl.cu` | Adds CUDA wrap mode using a `WrapCoordinate` device helper with `if constexpr` compile-time specialization. Removes dead wrap code from the NCHW-specialized kernel path. | | `onnxruntime/core/providers/cuda/tensor/pad.cc` | Computes effective sliced input extents/offsets for wrap behavior with negative pads. Bypasses the NCHW fast-path for wrap mode and routes through the generic implementation. | #### Documentation | File | Change | |------|--------| | `docs/OperatorKernels.md` | Updates the CUDA Pad kernel opset coverage to reflect the new version splits (25+, 24, 23, [21,22], [19,20], 18) up to opset 25. | #### Test coverage | File | Change | |------|--------| | `onnxruntime/test/providers/cpu/tensor/pad_test.cc` | Adds CUDA-only Pad coverage for `edge` across opsets 18-25 and `wrap` across opsets 19-25. Updates existing wrap test comment. | ### Checklist - [x] Tests added/updated - [x] No breaking changes  --- ✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: Shirasawa <764798966@qq.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>

References

#27774 - [CUDA] Extend Pad support through opset 25 with wrap mode

Author

Copilot

Parents

45b5900d

onnxruntime fde4e032 - [CUDA] Extend Pad support through opset 25 with wrap mode (#27774)

onnxruntime
fde4e032 - [CUDA] Extend Pad support through opset 25 with wrap mode (#27774)