onnxruntime
62567624 - [CoreML EP] Support bool Cast in ML Program (#28595)

Commit
5 days ago
[CoreML EP] Support bool Cast in ML Program (#28595) ### Summary Two changes to the ML Program `Cast` builder: 1. **Accept `BOOL` as a source and target dtype** in `HasSupportedInputsImpl`. The ML Program `cast` op already handles bool, and `AddToModelBuilderImpl` already maps `to == BOOL`; only the input/output type gate omitted it. 2. **Move the "no preceding node" check after the ML Program early-return.** That check is legacy gating for the NeuralNetwork ArgMax-only path (which dereferences `InputEdgesBegin()`); on the ML Program path a `Cast` fed directly by a graph input is fine, and rejecting it forced needless CPU fallback. ### Why This is the first of a **4-PR series** giving the CoreML EP the op coverage to run transformer and diffusion graphs as a *single CoreML partition* instead of fragmenting across CPU. Transformer attention-mask graphs are a `Cast → GatherND → And → Where` chain over **bool** tensors. A CoreML partition cannot have a bool input/output (CoreML `MLMultiArray` has no bool type), so bool must stay *internal* — which makes `Cast` (the int↔bool boundary) the prerequisite for the rest of the series. ### Combined impact of the series With all four PRs plus #28278 (scalar-`Gather`), every model below goes from 2 CoreML partitions to **1, with zero graph breaks** — the whole graph runs on CoreML. Measured on an Apple M3 Max, ML Program format: | Model | partitions (before → after) | CoreML vs CPU | |-------|:---------------------------:|--------------:| | BERT-large (340M) | 2 → 1 | 7.3× (fp32) / 11.0× (fp16) | | ViT-large (304M) | 2 → 1 | 8.5× (fp32) / 10.3× (fp16) | | GPT-2-large (774M) | 2 → 1 | 11.4× (fp16) | | SD-1.5 UNet (860M) | 2 → 1 | 9.7× (fp16) | The op builders eliminate the graph breaks (deterministic); the speedups are what CoreML already delivers once a model is no longer fragmented. ### Tests (`coreml_basic_test.cc`) - `CastNonArgMaxNeuralNetworkNotSupported` — an `int64 → bool → float` cast chain falls back to CPU on the NeuralNetwork format, guarding the `IsOpSupportedImpl` reordering. Positive `bool`-Cast coverage is in the dependent PRs: `Cast → GatherND → Cast` (#28598's `GatherNDBoolData_MLProgram`) and `Cast → And → Cast` (#28597's `And_MLProgram`). Both place a non-`Cast` op between the int↔bool casts and check the result against the CPU EP. A *standalone* `int64 → Cast(bool) → Cast(float)` round-trip can't be verified here — CoreML's compiler fuses back-to-back `cast` ops and drops the bool clamp — so the pattern needs that intervening op, which only the dependent PRs provide. ### Series — CoreML EP coverage for transformer / diffusion graphs - **#28595 — Support bool Cast in ML Program** *(this PR — prerequisite)* - #28596 — Add Sin and Cos unary ops *(independent)* - #28597 — Add Where and And builders *(depends on #28595)* - #28598 — Add GatherND builder *(depends on #28595)* Together with #28278 (scalar-`Gather`), the series takes BERT / GPT-2 / ViT / diffusion-UNet graphs — tiny and full-size — from 2 CoreML partitions to 1, with zero graph breaks. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Parents
Loading