onnxruntime
8b6b0b6d - Add LabelEncoder CUDA execution provider for numeric types (#28045)

Commit
11 days ago
Add LabelEncoder CUDA execution provider for numeric types (#28045) ### Description Implements `ai.onnx.ml.LabelEncoder` on the CUDA execution provider for numeric key/value types using sorted arrays + binary search (O(log n) per element). **New files** (`onnxruntime/core/providers/cuda/ml/`): - `label_encoder_impl.cu` / `.h` — CUDA kernel: per-thread binary search on sorted keys, NaN-aware for float/double - `label_encoder.cc` / `.h` — Host-side op classes (`CudaLabelEncoder` for opset 2-3, `CudaLabelEncoder_4` for opset 4+). Constructor sorts keys, copies to GPU; `ComputeInternal` launches kernel. **Modified files**: - `cuda_execution_provider.cc` — Register 11 kernel variants (4 versioned opset 2-3, 7 opset 4+) - `provider_api.h` — Add missing `kMLDomain` constant (first ML-domain op on CUDA EP) - `docs/OperatorKernels.md` — Add `ai.onnx.ml` section to CUDA provider table **Supported type combinations**: | Opset | Types | |-------|-------| | 2-3 | `int64↔float`, `int64↔int64`, `float↔float` | | 4+ | Above + `double↔double`, `double↔int64`, `int64↔double` | String types remain CPU-only. NaN keys are placed at end of sorted array and short-circuited before binary search. **Tests**: 5 new test cases covering NaN-key-to-numeric-value mappings and double type combinations. Existing numeric tests (`FloatToInt64Opset2`, `Int64ToFloatOpset2`, etc.) will automatically run on CUDA via `OpTester::Run()`. ### Motivation and Context Models with large LabelEncoder nodes (>100k entries) force a CPU round-trip when all other nodes run on GPU. This adds the CUDA implementation to eliminate that data transfer bottleneck. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Author
Parents
Loading