onnxruntime
d165fba0 - CUDA Plugin EP: NHWC Cleanup & Hardening (#28612)

Commit
34 days ago
CUDA Plugin EP: NHWC Cleanup & Hardening (#28612) ## Summary Unifies the NHWC-eligible op allowlist between the bundled CUDA EP and the CUDA plugin EP into a single shared header, adds kernel-miss diagnostics, and expands NHWC test coverage from 4 ops to 11. ## Motivation The bundled EP (`cuda_execution_provider.cc`) and the plugin EP (`plugin/cuda_ep.cc`) independently maintained their own copies of the NHWC allowlist. This created a maintenance hazard where ops could be added to one but not the other, leading to silent divergence. Additionally, there was no runtime diagnostic when the framework rewrote a node to the NHWC domain but the plugin EP lacked a matching kernel — failures were silent fallbacks to CPU. ## Key Changes ### Shared NHWC Allowlist (`cuda_nhwc_ops.h`) | Item | Detail | |------|--------| | New file | `onnxruntime/core/providers/cuda/cuda_nhwc_ops.h` | | Contents | `IsNhwcEligibleOnnxOp()`, `IsNhwcEligibleMsOp()`, `IsNhwcEligible()` inline functions | | Ops covered | AveragePool, BatchNormalization, Conv, ConvTranspose, DepthToSpace, GlobalAveragePool, GlobalMaxPool, GridSample, LRN, MaxPool, SpaceToDepth (+ MS-domain GridSample) | ### Bundled EP Refactor (`cuda_execution_provider.cc`) - Removed the static `std::unordered_set<std::string_view> cuda_nhwc_onnx_ops` and the inline domain check logic. - Replaced with a single call to `cuda::IsNhwcEligible(node_domain, node_op_type)`. ### Plugin EP Refactor & Diagnostics (`plugin/cuda_ep.cc`) - `ShouldConvertDataLayoutForOpImpl`: Replaced ~20 lines of static set + domain checks with a single `cuda::IsNhwcEligible()` call. - `GetCapabilityImpl`: Added a WARNING-level diagnostic in the `else` branch (kernel not found). When a node in the `com.ms.internal.nhwc` domain has no registered kernel, the log emits the op type, domain, version, and node name — making future NHWC registration gaps immediately visible at session creation. ### Expanded NHWC Test Coverage (`test_cuda_plugin_ep.py`) - Added `_assert_nhwc_domain_assigned()` helper that verifies NHWC layout transformation occurred by checking for framework-inserted Transpose nodes in the EP's assignment info. - Added `_run_nhwc_model_test()` helper combining domain assertion + numerical validation. - Updated 4 existing NHWC tests (Conv, BatchNormalization, MaxPool, AveragePool) to include structural assertions. - Added 7 new NHWC test methods: - `test_nhwc_conv_transpose` - `test_nhwc_global_max_pool` - `test_nhwc_global_average_pool` - `test_nhwc_depth_to_space` - `test_nhwc_space_to_depth` - `test_nhwc_lrn` - `test_nhwc_grid_sample` ## Testing Notes Run the full CUDA plugin EP test suite with NHWC enabled: ```bash bash .env/cuda13_plugin.sh --build --install --test_plugin ``` Or run only the NHWC tests directly: ```bash cd onnxruntime/test/python/transformers ORT_TEST_CUDA_PLUGIN_EP=1 python -m unittest \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_conv \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_batch_normalization \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_maxpool \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_avgpool \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_conv_transpose \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_global_max_pool \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_global_average_pool \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_depth_to_space \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_space_to_depth \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_lrn \ test_cuda_plugin_ep.TestCudaPluginEP.test_nhwc_grid_sample ``` All 86 tests in the suite pass (11 NHWC + 75 existing), with no regressions.
Author
Parents
Loading