onnxruntime
43fd961a - Integrate ONNX 1.22.0 (opset 27) — issue #28752 (#28754)

Commit

15 days ago

Integrate ONNX 1.22.0 (opset 27) — issue #28752 (#28754) ### Integrate ONNX 1.22.0rc1 (opset 27) Resolves #28752. Pin: `onnx/onnx@bc3be77bec2f628788796dff60819186bacf49df` (VERSION_NUMBER `1.22.0rc1`). ONNX **1.21.0 → 1.22.0rc1**. Max ai.onnx opset **26 → 27**. IR version **unchanged (13 / `0x0D`)**. This is the **RC validation phase** of an incremental integration (same strategy as the ONNX 1.21 bump, #27601). The formal `v1.22.0` GitHub release is still a **draft** (no git tag yet), so re-pinning to the released tag is deferred to **Phase 2** (see Follow-ups). Landing the RC now validates ONNX 1.22 against ORT before ONNX publishes the formal release. --- ### Update — ONNX 1.22.0 **FINAL** re-pin + rebase onto `upstream/main` + closes #28969 ONNX published the formal **`v1.22.0`** GitHub release, so this PR is re-pinned **rc2 → FINAL** (`onnx/onnx@v1.22.0`) — the Phase-2 step deferred in the rc1 description below. The branch was also **rebased onto `upstream/main`** to pick up the intervening optimizer/opset-26 work. The released tag tarball is a different asset hash than the RCs, so the vcpkg MS-internal asset mirror was re-seeded for the final tag (otherwise `--use_vcpkg` legs 404). **Also closes #28969** (WebGPU binary-elementwise broadcast `SIZE_MAX` underflow). ONNX 1.22's expanded-Attention reference tests exposed a latent WebGPU bug where a broadcast shape computed `dim - 1` on a zero/unit dimension and underflowed to `SIZE_MAX`; the fix is included here and the previously-skipped reference tests are re-enabled. **Opset-27 `*CurrentOpset` test handling.** ONNX 1.22.0 FINAL ships `DomainToVersionRange` **map-max 27** while the last *released* opset is **26**, so **opset 27 stays under development** for the whole 1.22 cycle. Strict legs (the default, or `ALLOW_RELEASED_ONNX_OPSET_ONLY=1`) therefore throw *"Opset 27 under development"* at model load on every `*CurrentOpset` fusion test that builds at the max opset. These tests now load with per-model `ModelOptions{/*allow_released_opsets_only*/ false, /*strict_shape_type_inference*/ false}`, extending the existing `38f17243b` / GatherToSlice precedent to the rest of the `*CurrentOpset` suite. This is **leg-agnostic** (exercises opset 27 on every CI leg, not just the relaxed ones) and **preserves opset coverage** (vs. `GTEST_SKIP`). Each call site is annotated with a one-line WHY + tracking issue (#28966) so the relaxation can be removed once opset 27 is released. `Resolves #28752` (unchanged). Closes #28969. ### Update — ONNX 1.22.0rc2 re-pin + ConvTranspose conforms to ONNX `output_shape` spec Since the original rc1 description below, this PR was re-pinned **rc1 → rc2** (`onnx/onnx@b124e0188a`, `VERSION_NUMBER 1.22.0rc2`) to pick up the upstream Xcode/iOS CMake fix (onnx#8056). rc2 also carries onnx#8051, which tightened `convTransposeShapeInference` to reject an `output_shape`/`output_padding` whose size does not match the number of spatial dimensions (per the ONNX spec clarification onnx#5400). **ONNX Runtime now conforms to that spec** instead of patching ONNX to preserve a non-standard form. **⚠️ Breaking change — ConvTranspose `output_shape` now follows the ONNX spec (spatial dimensions only).** ORT previously also accepted a non-standard `rank + 2` form that included batch and channel, i.e. `(N, C, H, W)`. As of ONNX 1.22, a `rank + 2` `output_shape` on a ConvTranspose whose input has a **statically-known rank** is rejected at `Graph::Resolve` with *"Attribute output_shape has incorrect size"*. **Migration:** specify `output_shape` with spatial dimensions only — e.g. `{1, 1, 1, 14}` → `{1, 14}` (batch and channel are always inferred from the input and weight, so results are identical; the kernel ignores `N, C`). Models whose ConvTranspose input has a **dynamic/unknown rank are unaffected** — ONNX skips the size check and ORT computes the same result (covered by the new `ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime` test). **Patch inventory — supersedes "2 files, 3 hunks" below.** `cmake/patches/onnx/onnx.patch` (and its byte-identical `binskim.patch` mirror) carries **only** the `ONNX_MINIMAL_BUILD` option hunk and the GroupNormalization-18 `.Deprecate()` removal — **no ConvTranspose hunks**. rc2's strict shape-inference check is kept as-is; ORT's own test models were conformed to the spec. The upstream archive hash, `deps.txt`, `portfile.cmake`, `vcpkg.json`, and the submodule pin are unchanged. **Additional rc2 test conform.** rc2 also tightened `convPoolShapeInference` to reject `Conv` inputs with rank < 3 (*"Input tensor must have at least 3 dimensions"*). The hand-authored model in `onnxruntime/test/python/quantization/test_op_split.py` declared a spec-invalid rank-2 `Conv` input/weight; it was conformed to a valid NCHW shape (`[6, 3]` → `[1, 1, 6, 3]`, weight → `[2, 1, 1, 1]`), keeping the quantized-Split graph and expected outputs identical. No ORT source change. > This note should also seed the GitHub Release notes for the ONNX 1.22 / opset 27 milestone and the squash-commit message. --- ### What changed (29 files) **Version plumbing** - `cmake/deps.txt` — onnx archive URL → rc1 commit zip + SHA1 `421e5a9afb6c41a54696e424e5b9a3796aab6821`. - `cmake/external/onnx` — submodule → `bc3be77b`. - `cmake/vcpkg-ports/onnx/portfile.cmake` — `REF` commit form + tar.gz SHA512 `e0c526f5…3ce467`. - `cmake/vcpkg-ports/onnx/vcpkg.json` — `version-semver` `1.22.0`, `port-version` 0. - `cmake/patches/onnx/onnx.patch` + `cmake/vcpkg-ports/onnx/binskim.patch` — **byte-identical** rebase onto 1.22 (2 files, 3 hunks): kept the `ONNX_MINIMAL_BUILD` option (restructured for 1.22's new `onnx_core` OBJECT-lib / `add_subdirectory(onnx)` layout) and the GroupNormalization-18 `.Deprecate()` removal; **dropped** the `Utils.cmake` protobuf-warnings hunk (already merged upstream in 1.22). **Opset-27 op enablement (Range)** - `onnxruntime/core/providers/cpu/generator/range.cc` — split into versioned `[11, 26]` + a new unversioned `27` registration. The opset-27 kernel natively supports the existing common numeric types (float/double/int16/int32/int64). **fp16 Range is covered** via ONNX's Range-27 **function body**, which ORT expands into primitive ops at partition time. **bf16 Range is deferred to that same function expansion** — there is no native bf16 kernel, and its bf16 reference node test (`test_range_bfloat16_type_positive_delta`, base + `_expanded`) is not exercised by the Python/numpy ONNX backend series, whose harness cannot materialize bf16 (`Numpy_type 256`); a native fp16/bf16 kernel + `stash_type` handling is a follow-up (efficiency, not correctness). - `onnxruntime/core/providers/cpu/cpu_execution_provider.cc` — versioned the Range forward-declare + `BuildKernelCreateInfo` entries and added the opset-27 registration. - **CUDA Range** — same versioned `[11, 26]` + opset-27 split as CPU (`onnxruntime/core/providers/cuda/generator/range.cc` + `cuda_execution_provider.cc`); GPU-verified locally: `onnx_test_runner -e cuda` 8/8 opset-27 Range node tests pass, native Range-27 placed on CUDAExecutionProvider (fp16/bf16 via function expansion). **Optimizer / EP opset ceilings** - `…/transpose_optimization/optimizer_api.h` — `kMaxSupportedOpset` **26 → 27**. - `coreml`/`nnapi`/`vsinpu`/`webnn` `base_op_builder.h` — `GetMaxSupportedOpSet()` **25 → 27** (upper guard only; per-op support checks still gate — these EPs gain no new kernels here). **Fusion updates** - `onnxruntime/core/optimizer/gather_fusion.cc` — GatherToSlice Range version list `{1,11}` → `{1,11,27}`. - `onnxruntime/core/optimizer/embed_layer_norm_fusion.cc` — add `27` to the two Range path-matchers (`parent_path_3/4`) so embedding fusion still matches opset-27 models. - `onnxruntime/test/optimizer/graph_transform_test.cc` — new opset-27 GatherToSliceFusion test. **Requirements (7 bumped)** - All 7 CI `requirements.txt` → `onnx==1.22.0rc1` (rc1 wheel is on PyPI). The 3 transformers pins remain frozen at `1.18.0` (unrelated to this bump; intentionally untouched). **Generated docs / test data** - `js/web/docs/webgl-operators.md` — regenerated. - `docs/OperatorKernels.md` — **surgical** edit: CPU EP **and** CUDA EP Range rows (`27+` + `[11, 26]` continuation each); see caveats. - `onnxruntime/test/testdata/onnx_backend_test_series_filters.jsonc` — **comment-only**: documents why no opset-27 CPU exclusions are needed (all opset-27 node tests pass via function expansion). **Docs** - `.agents/skills/onnx-opset-bump-checklist/SKILL.md` — new reusable checklist skill distilled from this integration. Now also documents the "bump **all** execution providers together" tradition (CPU + CUDA + JS/DML assessment in one pass) so future opset bumps don't ship a partial EP set. --- ### Validation (CPU EP + CUDA EP, Linux x64) - Full build ✅ - `--minimal_build extended` build ✅ (validates the rebased `ONNX_MINIMAL_BUILD` patch hunk independently of the vcpkg mirror path) - `onnxruntime_test_all` ✅ — **1595 passed / 0 failed** - `onnx_test_runner -e cpu` on the ONNX 1.22 opset-27 node tests ✅ — **62/62 pass** via ONNX function-body expansion (run with `ALLOW_RELEASED_ONNX_OPSET_ONLY=0`), including CausalConvWithState, LinearAttention, and fp16/bf16 Range — despite no native kernels for them. - **CUDA EP (H100):** built `--use_cuda` clean in both **Debug** and **RelWithDebInfo** ✅; `onnx_test_runner -e cuda` on the opset-27 Range node tests ✅ — **8/8 pass**, with native Range-27 placed on CUDAExecutionProvider (no CPU fallback) and fp16/bf16 covered via function-body expansion. --- ### Standing caveats (please read before reviewing) 1. **CUDA EP now locally verified for Range; other GPU EPs/ops still CI-only.** The CUDA EP was built and the opset-27 **Range** node tests run locally on an H100 (8/8 pass). DML and the remaining GPU EPs/ops were **not** exercised here. Function-body expansion is EP-agnostic, so other opset-27 models are expected to run on those EPs too, but broader GPU coverage remains a CI/follow-up item. 2. **`OperatorKernels.md` updated surgically** (CPU Range row only). A CPU-only *full* regen would destructively wipe the CUDA/DML/other-EP sections (the generator only emits rows for the EPs in the built module). A correct multi-EP regen needs a build per EP and is a follow-up. 3. **Opset 27 is "under development"** in ONNX's released-versions map. ORT's load-time validation rejects opset-27 models unless `ALLOW_RELEASED_ONNX_OPSET_ONLY=0` (ORT CI already sets this). The opset-27 **schemas are always compiled in from the submodule** regardless — this gate only affects model load-time acceptance, not schema availability. 4. **EP `GetMaxSupportedOpSet` jumped 25 → 27** (skips 26). This is an *upper* guard only; raising it merely lets opset-26/27 nodes reach the per-op support checks that still gate correctness. No regression — it also retroactively un-caps opset-26 for these EPs. 5. **iOS/macOS Xcode framework build is currently broken by an upstream ONNX CMake regression** (the `onnx_core` OBJECT-library split in onnx/onnx#7733 reintroduced the Xcode breakage originally fixed by onnx/onnx#7515 for onnx/onnx#7514). This is **NOT** caused by this opset bump. Tracked upstream at [onnx/onnx#8053](https://github.com/onnx/onnx/issues/8053). Non-Xcode builds (Linux/Windows/Android/WASM) and all CPU/CUDA validation are unaffected. This resolves at the **Phase 2** formal `v1.22.0` re-pin once ONNX ships the fix. --- ### Follow-ups (explicitly NOT in this PR) - **GPU/multi-EP coverage:** run opset-27 CUDA/DML node tests; regenerate `OperatorKernels.md` across all EPs. - **JS EP Range** `[11, 26]` + `27` split (currently registered open-ended at `11`; mirror the CPU/CUDA versioned split). - **DML Range opset-27 assessment** (DML uses its own `REG_INFO` registration system — assess whether an opset-27 entry is needed). - **WebGPU EP Range** opset-27 split — `range.cc` registers `Range` `.SinceVersion(11)` open-ended, so it already claims opset-27 Range; only the new bf16 type is unsupported and falls back via the `T` type-constraint (function expansion). Mirror the CPU/CUDA versioned `[11, 26]` + `27` split. - **Native kernels:** implement CPU (and EP) `CausalConvWithState` and `LinearAttention` kernels, and a native fp16/bf16 + `stash_type` Range-27 kernel (replace today's function-expansion path with efficient kernels). - **Phase 2 — formal `v1.22.0` re-pin:** re-pin `deps.txt`/submodule/portfile/requirements to the released tag once ONNX publishes it (currently blocked on ONNX tagging the release); upload the tag tarball to the vcpkg mirror. **This also restores the iOS/macOS Xcode framework build** once the upstream onnx OBJECT-library Xcode regression (caveat 5) is resolved and re-pinned. - **Tooling:** fix the pre-existing crash in `find_optimizer_opset_version_updates_required.py` (placeholder `ver` parsed as int) so it can be relied on for future bumps. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

References

#28754 - Integrate ONNX 1.22.0 (opset 27) — issue #28752

Author

titaiwangms

Parents

cbb74576

onnxruntime 43fd961a - Integrate ONNX 1.22.0 (opset 27) — issue #28752 (#28754)

onnxruntime
43fd961a - Integrate ONNX 1.22.0 (opset 27) — issue #28752 (#28754)