[CoreML EP] Support pre-opset-13 Split via 'split' attribute (#28270)
### Description
The CoreML `SplitOpBuilder` previously gated `GetMinSupportedOpSet` at
13 because pre-13 `Split` carries split sizes via an INTS attribute
rather than a second input. This PR lowers the gate to 1 and reads the
attribute in both the MLProgram and NeuralNetwork emitters, so `Split`
from any opset is supported on the CoreML EP.
The validation in `IsOpSupportedImpl` mirrors the existing input-form
rules — ≥2 outputs, sum of sizes equals the axis dim, all sizes
positive, axis dim not dynamic. For the no-attribute / no-input case
(legacy even-split) we also explicitly require the axis dim to be evenly
divisible by `num_outputs`, since CoreML's `num_splits` requires that.
This is a behavior change only for opset 2–12 graphs that were 100%
rejected before, so no path that used to work regresses.
### Motivation
DWPose `dw-ll_ucoco_384.onnx` (opset 11), a common pose-estimation
model, has two `Split` nodes — one uneven (`split=[512, 512, 128]`) and
one even (`split=[1, 1]`). Both fall back to CPU today, fragmenting the
CoreML partition.
| | Without this PR | With this PR |
|---|---|---|
| CoreML partitions | 3 | **1** |
| Nodes on CoreML EP | 301 / 303 | **303 / 303** |
### Benchmark — M3 Max, MLProgram, batch 1, 1299-iter steady state
| Metric | Without PR | With PR | Δ |
|---|---|---|---|
| Mean | 6.838 ms | 6.565 ms | −4.0% |
| **StdDev** | **0.239 ms** | **0.170 ms** | **−29%** |
| P50 | 6.810 ms | 6.545 ms | −3.9% |
| P95 | 7.070 ms | 6.775 ms | −4.2% |
| P99 | 7.330 ms | 6.928 ms | −5.5% |
| P99.9 | 8.917 ms | 8.164 ms | −8.4% |
| **Max** | **12.616 ms** | **10.360 ms** | **−17.9%** |
Removing the two CPU↔CoreML round trips improves the tail far more than
the median — useful for real-time perception pipelines where worst-case
latency determines the frame budget.
### Tests
Eight new tests in
`onnxruntime/test/providers/coreml/coreml_basic_test.cc`, each
exercising both the NeuralNetwork and MLProgram emitters and asserting
full CoreML EP node assignment (no CPU fallback).
**Pre-opset-13 attribute form (the new code path):**
- `Split7UnevenAttribute` — opset 7 uneven `split=[4, 3, 2]`, covering
the opset 7–10 range.
- `Split11UnevenAttribute` — DWPose's pattern, `split=[4, 3, 2]`.
- `Split11EvenAttribute` — uniform sizes via attribute.
- `Split11NoAttributeEven` — falls through to the `num_splits =
num_outputs` branch.
**Post-opset-13 input form (parity with the existing, untouched path):**
- `Split13UnevenInput` — `split` input `[4, 3, 2]`.
- `Split13EvenInput` — uniform sizes via input.
- `Split13NoSplitInputEven` — no `split` input, even-split fallback.
**Negative coverage:**
- `Split11ZeroSplitValueNotSupported` — verifies the attribute-form
rejection of a non-positive entry; expects no CoreML assignment.
All eight pass locally on macOS 26.3 / M3 Max.
### Motivation for upstreaming
Most pre-2023 vision exports (DWPose, MMPose models, original
YOLOv5/v7/v8, etc.) target ONNX opset 11/12 and use the `Split`
attribute form. They currently lose any `Split` to CPU on the CoreML EP.
This is a self-contained gap with a clean fix.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>