onnxruntime
8c4245e6 - CUDA Plugin Cleanup for Shared Kernel Helpers (#27915)

Commit
19 days ago
CUDA Plugin Cleanup for Shared Kernel Helpers (#27915) ## Description This PR reduces the amount of CUDA plugin-specific compatibility code by moving reusable validation and attribute-reading logic into shared helper paths that work for both bundled and plugin builds. It also fills in a missing allocator hook in the EP adapter so plugin kernels can reuse the same initialization path as the in-tree CUDA EP, which simplifies maintenance and improves behavior parity. The follow-up changes update the CUDA plugin design doc to reflect the new shared-helper model and add focused plugin regression tests for the two runtime paths that changed most materially. ## Summary of Changes ### EP adapter and shared helper extraction | File | Change | |------|--------| | `ep/adapter/op_kernel_info.h` | Adds `OpKernelInfo::GetAllocator(OrtMemType)` so adapter-based kernels can request device or CPU temp allocators in plugin builds. | | `cpu/tensor/scatter_nd.h` | Extracts shape validation into `scatter_nd_internal::ValidateShapes` so the same logic can be reused outside the CPU `ScatterND` class. | | `cpu/tensor/space_depth_ops.h` | Moves blocksize parsing, mode parsing, and dimension validation into `space_depth_internal` helpers that can be shared by CUDA kernels. | ### CUDA kernel cleanup and plugin parity | File | Change | |------|--------| | `cuda/tensor/scatter_nd.cc` | Removes the plugin-only `ScatterND` validation duplicate and reuses the shared helper implementation. | | `cuda/tensor/scatter_nd.h` | Drops the old conditional include split now that validation is shared through the common helper path. | | `cuda/tensor/space_depth_ops.h` | Deletes the plugin-only `SpaceToDepth`/`DepthToSpace` reimplementation and inherits from the shared base/helper logic in all builds. | | `cuda/tensor/upsample.cc` | Reuses the normal antialias lookup-table allocation/caching path in plugin builds via the new allocator adapter support. | | `cuda/tensor/upsample.h` | Keeps the persistent device lookup-table member available in plugin builds as well. | ### Shared-provider and diagnostics alignment | File | Change | |------|--------| | `cpu/cpu_provider_shared.cc` | Routes shared-provider `ScatterND` shape validation through the extracted helper. | | `provider_bridge_provider.cc` | Updates the bridge-side `ScatterND::ValidateShapes` implementation to call the shared helper directly. | | `cuda/cudnn_common.h` | Preserves the batch-norm epsilon warning path in plugin builds instead of suppressing it. | | `cuda/nn/conv.cc` | Removes plugin-specific shortened cuDNN frontend errors so bundled and plugin builds both include frontend JSON in failures. | | `cuda/nn/conv_transpose.cc` | Extends cuDNN frontend failures to include frontend JSON for easier debugging, matching the `Conv` behavior. | ### Documentation and regression coverage | File | Change | |------|--------| | `cuda_plugin_ep_design.md` | Updates the design doc to reflect that `ScatterND`, `SpaceDepth`, and `Upsample` now use shared adapter-safe helper paths instead of plugin-only fallback branches. | | `test_cuda_plugin_ep.py` | Adds plugin regression coverage for antialias `Resize`/`Upsample` and `ScatterND`, covering the new allocator-backed lookup-table path and the shared `ScatterND` validation helper. | ## Testing - Build with `onnxruntime_BUILD_CUDA_EP_AS_PLUGIN=ON` and verify the affected CUDA provider sources compile without the removed plugin-only fallback paths. - Run targeted CUDA provider coverage for `ScatterND`, `SpaceToDepth`/`DepthToSpace`, `Resize`/`Upsample`, `Conv`, and `ConvTranspose` in both plugin and bundled CUDA configurations. - Confirm antialias upsample still initializes and uses the shared lookup table correctly in plugin builds. - Run the new plugin tests for antialias `Resize` and `ScatterND` in `onnxruntime/test/python/transformers/test_cuda_plugin_ep.py`. - Confirm cuDNN frontend failure paths now emit the same diagnostic detail in plugin and non-plugin builds. ## Motivation and Context The initial CUDA plugin enablement introduced several localized `#ifdef BUILD_CUDA_EP_AS_PLUGIN` branches and helper copies to get kernels compiling under the adapter path. This cleanup pays down that compatibility debt by extracting the truly shared pieces into reusable helpers and by teaching the adapter `OpKernelInfo` how to provide the allocators those kernels already expect. The result is less duplicated logic, fewer plugin-only code paths to keep in sync, and better debugging consistency between the plugin EP and the built-in CUDA EP. ## Checklist - [x] Tests added/updated - [x] Documentation updated (if applicable) - [x] No breaking changes (or documented in description)
Author
Parents
Loading