onnxruntime
aa92574b - [CoreML EP] Add FusedConv support (#28289)

Commit
17 days ago
[CoreML EP] Add FusedConv support (#28289) ### Description Adds support for \`com.microsoft:FusedConv\` to the CoreML EP's MLProgram and NeuralNetwork paths. \`FusedConv\` is produced by ORT's \`ConvActivationFusion\` pass when a model is optimized with the CPU EP (or any EP in \`cpu_acl_js_webgpu_eps\`) and saved via \`session.optimized_model_filepath\` or the ORT-format conversion tool. The saved graph contains \`com.microsoft:FusedConv\` nodes that — before this patch — the CoreML EP could not claim, fragmenting the partition. ORT's in-process pipeline does not currently run \`ConvActivationFusion\` when CoreML EP is the target (the fusion's compat set excludes CoreML), so \`FusedConv\` typically reaches the CoreML EP only via pre-optimized graphs. That's a real and common workflow: anyone shipping a pre-optimized model artifact (mobile pipelines, ORT-format models, session-cached optimized graphs) that's then loaded with the CoreML EP hits this path. There's no pre-existing issue tracking this; it was discovered via DWPose / ResNet50 partitioning analysis on Apple Silicon. ### Empirical impact ResNet50-v2 from the ONNX model zoo, CPU-optimized at \`ORT_ENABLE_EXTENDED\` and reloaded on the CoreML EP (108 nodes total, 33 of them \`FusedConv\` with Relu activation). M3 Max, MLProgram, batch 1, 100-iter timed runs, 3 interleaved rounds (n=597 per variant): | | Partitions | Nodes on CoreML | Mean | StdDev | P99 | Max | |---|---|---|---|---|---|---| | Without this patch | 18 | 75 / 108 | 23.34 ms | 1.01 | 27.68 | 30.59 | | **With this patch** | **1** | **108 / 108** | **2.94 ms** | **0.16** | **3.75** | **4.32** | **7.94× mean speedup.** The 33 FusedConv nodes that previously fell back to CPU now stay on the ANE/GPU. Variance also tightens 6× (stddev 1.01 → 0.16). Partition counts on other Conv-heavy ONNX-zoo models post CPU-optimization: | Model | Without | With | Notes | |---|---|---|---| | ResNet50-v2 | 18 | **1** | 33 FusedConv (Relu) | | FCN-ResNet50 | 18 | **1** | 35 FusedConv (Relu); fails to compile on CoreML for unrelated reasons | | YOLOv3 (full) | 27 | **4** | 72 FusedConv (LeakyRelu); detection post-proc fails on CoreML for unrelated dynamic-shape reasons | | YOLOv3-tiny | 13 | **7** | 11 FusedConv (LeakyRelu); same | Partition reduction is robust across architectures. ResNet50 is the configuration that runs end-to-end on this exact ONNX-zoo collection on the CoreML EP today; the FCN/YOLO failures are orthogonal CoreML-EP limitations on segmentation upsampling and detection post-processing. ### Implementation Reuses \`ConvOpBuilder\`, which now branches on \`op_type\`: - \`Conv\`: behaviour unchanged. - \`FusedConv\`: emit the \`conv\` MIL op into an intermediate, then chain the activation MIL op on top. Supports all six activation types \`ConvActivationFusion\` produces: | ONNX activation | MIL op | params | |---|---|---| | Relu | \`relu\` | – | | Sigmoid | \`sigmoid\` | – | | Tanh | \`tanh\` | – | | LeakyRelu | \`leaky_relu\` | alpha (from \`activation_params\`) | | Clip | \`clip\` | alpha=min, beta=max (from \`activation_params\`) | | HardSigmoid | \`sigmoid_hard\` | alpha, beta (from \`activation_params\`) | \`IsOpSupportedImpl\` rejects \`FusedConv\` in NeuralNetwork mode (which would emit an unfused Conv and silently lose the activation) and rejects any unrecognized activation string. ### Tests Six new tests in \`onnxruntime/test/providers/coreml/coreml_basic_test.cc\`, one per supported activation class (param-less, single-param, two-param-positional, two-param-named): - \`FusedConvTestRelu\` — no \`activation_params\` attribute - \`FusedConvTestSigmoid\` — same shape, exercises sigmoid op-name dispatch - \`FusedConvTestTanh\` — same shape, exercises tanh op-name dispatch - \`FusedConvTestLeakyRelu\` — single param (alpha); the YOLOv3 case - \`FusedConvTestClip\` — two params (min, max) - \`FusedConvTestHardSigmoid\` — two params (alpha, beta); depends on the HardSigmoid CoreML builder landed in #28182 Each verifies CoreML output against the CPU EP reference and asserts \`ExpectedEPNodeAssignment::All\`. All pass locally on macOS 26.3 / M3 Max. Also adds the supported-ops doc entry. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Parents
Loading