Integrate ONNX 1.22.0 (opset 27) — issue #28752 (#28754)
### Integrate ONNX 1.22.0rc1 (opset 27)
Resolves #28752.
Pin: `onnx/onnx@bc3be77bec2f628788796dff60819186bacf49df`
(VERSION_NUMBER `1.22.0rc1`).
ONNX **1.21.0 → 1.22.0rc1**. Max ai.onnx opset **26 → 27**. IR version
**unchanged (13 / `0x0D`)**.
This is the **RC validation phase** of an incremental integration (same
strategy as the ONNX 1.21 bump, #27601). The formal `v1.22.0` GitHub
release is still a **draft** (no git tag yet), so re-pinning to the
released tag is deferred to **Phase 2** (see Follow-ups). Landing the RC
now validates ONNX 1.22 against ORT before ONNX publishes the formal
release.
---
### Update — ONNX 1.22.0 **FINAL** re-pin + rebase onto `upstream/main`
+ closes #28969
ONNX published the formal **`v1.22.0`** GitHub release, so this PR is
re-pinned **rc2 → FINAL** (`onnx/onnx@v1.22.0`) — the Phase-2 step
deferred in the rc1 description below. The branch was also **rebased
onto `upstream/main`** to pick up the intervening optimizer/opset-26
work. The released tag tarball is a different asset hash than the RCs,
so the vcpkg MS-internal asset mirror was re-seeded for the final tag
(otherwise `--use_vcpkg` legs 404).
**Also closes #28969** (WebGPU binary-elementwise broadcast `SIZE_MAX`
underflow). ONNX 1.22's expanded-Attention reference tests exposed a
latent WebGPU bug where a broadcast shape computed `dim - 1` on a
zero/unit dimension and underflowed to `SIZE_MAX`; the fix is included
here and the previously-skipped reference tests are re-enabled.
**Opset-27 `*CurrentOpset` test handling.** ONNX 1.22.0 FINAL ships
`DomainToVersionRange` **map-max 27** while the last *released* opset is
**26**, so **opset 27 stays under development** for the whole 1.22
cycle. Strict legs (the default, or `ALLOW_RELEASED_ONNX_OPSET_ONLY=1`)
therefore throw *"Opset 27 under development"* at model load on every
`*CurrentOpset` fusion test that builds at the max opset. These tests
now load with per-model `ModelOptions{/*allow_released_opsets_only*/
false, /*strict_shape_type_inference*/ false}`, extending the existing
`38f17243b` / GatherToSlice precedent to the rest of the `*CurrentOpset`
suite. This is **leg-agnostic** (exercises opset 27 on every CI leg, not
just the relaxed ones) and **preserves opset coverage** (vs.
`GTEST_SKIP`). Each call site is annotated with a one-line WHY +
tracking issue (#28966) so the relaxation can be removed once opset 27
is released.
`Resolves #28752` (unchanged). Closes #28969.
### Update — ONNX 1.22.0rc2 re-pin + ConvTranspose conforms to ONNX
`output_shape` spec
Since the original rc1 description below, this PR was re-pinned **rc1 →
rc2** (`onnx/onnx@b124e0188a`, `VERSION_NUMBER 1.22.0rc2`) to pick up
the upstream Xcode/iOS CMake fix (onnx#8056). rc2 also carries
onnx#8051, which tightened `convTransposeShapeInference` to reject an
`output_shape`/`output_padding` whose size does not match the number of
spatial dimensions (per the ONNX spec clarification onnx#5400). **ONNX
Runtime now conforms to that spec** instead of patching ONNX to preserve
a non-standard form.
**⚠️ Breaking change — ConvTranspose `output_shape` now follows the ONNX
spec (spatial dimensions only).** ORT previously also accepted a
non-standard `rank + 2` form that included batch and channel, i.e. `(N,
C, H, W)`. As of ONNX 1.22, a `rank + 2` `output_shape` on a
ConvTranspose whose input has a **statically-known rank** is rejected at
`Graph::Resolve` with *"Attribute output_shape has incorrect size"*.
**Migration:** specify `output_shape` with spatial dimensions only —
e.g. `{1, 1, 1, 14}` → `{1, 14}` (batch and channel are always inferred
from the input and weight, so results are identical; the kernel ignores
`N, C`). Models whose ConvTranspose input has a **dynamic/unknown rank
are unaffected** — ONNX skips the size check and ORT computes the same
result (covered by the new
`ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime` test).
**Patch inventory — supersedes "2 files, 3 hunks" below.**
`cmake/patches/onnx/onnx.patch` (and its byte-identical `binskim.patch`
mirror) carries **only** the `ONNX_MINIMAL_BUILD` option hunk and the
GroupNormalization-18 `.Deprecate()` removal — **no ConvTranspose
hunks**. rc2's strict shape-inference check is kept as-is; ORT's own
test models were conformed to the spec. The upstream archive hash,
`deps.txt`, `portfile.cmake`, `vcpkg.json`, and the submodule pin are
unchanged.
**Additional rc2 test conform.** rc2 also tightened
`convPoolShapeInference` to reject `Conv` inputs with rank < 3 (*"Input
tensor must have at least 3 dimensions"*). The hand-authored model in
`onnxruntime/test/python/quantization/test_op_split.py` declared a
spec-invalid rank-2 `Conv` input/weight; it was conformed to a valid
NCHW shape (`[6, 3]` → `[1, 1, 6, 3]`, weight → `[2, 1, 1, 1]`), keeping
the quantized-Split graph and expected outputs identical. No ORT source
change.
> This note should also seed the GitHub Release notes for the ONNX 1.22
/ opset 27 milestone and the squash-commit message.
---
### What changed (29 files)
**Version plumbing**
- `cmake/deps.txt` — onnx archive URL → rc1 commit zip + SHA1
`421e5a9afb6c41a54696e424e5b9a3796aab6821`.
- `cmake/external/onnx` — submodule → `bc3be77b`.
- `cmake/vcpkg-ports/onnx/portfile.cmake` — `REF` commit form + tar.gz
SHA512 `e0c526f5…3ce467`.
- `cmake/vcpkg-ports/onnx/vcpkg.json` — `version-semver` `1.22.0`,
`port-version` 0.
- `cmake/patches/onnx/onnx.patch` +
`cmake/vcpkg-ports/onnx/binskim.patch` — **byte-identical** rebase onto
1.22 (2 files, 3 hunks): kept the `ONNX_MINIMAL_BUILD` option
(restructured for 1.22's new `onnx_core` OBJECT-lib /
`add_subdirectory(onnx)` layout) and the GroupNormalization-18
`.Deprecate()` removal; **dropped** the `Utils.cmake` protobuf-warnings
hunk (already merged upstream in 1.22).
**Opset-27 op enablement (Range)**
- `onnxruntime/core/providers/cpu/generator/range.cc` — split into
versioned `[11, 26]` + a new unversioned `27` registration. The opset-27
kernel natively supports the existing common numeric types
(float/double/int16/int32/int64). **fp16 Range is covered** via ONNX's
Range-27 **function body**, which ORT expands into primitive ops at
partition time. **bf16 Range is deferred to that same function
expansion** — there is no native bf16 kernel, and its bf16 reference
node test (`test_range_bfloat16_type_positive_delta`, base +
`_expanded`) is not exercised by the Python/numpy ONNX backend series,
whose harness cannot materialize bf16 (`Numpy_type 256`); a native
fp16/bf16 kernel + `stash_type` handling is a follow-up (efficiency, not
correctness).
- `onnxruntime/core/providers/cpu/cpu_execution_provider.cc` — versioned
the Range forward-declare + `BuildKernelCreateInfo` entries and added
the opset-27 registration.
- **CUDA Range** — same versioned `[11, 26]` + opset-27 split as CPU
(`onnxruntime/core/providers/cuda/generator/range.cc` +
`cuda_execution_provider.cc`); GPU-verified locally: `onnx_test_runner
-e cuda` 8/8 opset-27 Range node tests pass, native Range-27 placed on
CUDAExecutionProvider (fp16/bf16 via function expansion).
**Optimizer / EP opset ceilings**
- `…/transpose_optimization/optimizer_api.h` — `kMaxSupportedOpset` **26
→ 27**.
- `coreml`/`nnapi`/`vsinpu`/`webnn` `base_op_builder.h` —
`GetMaxSupportedOpSet()` **25 → 27** (upper guard only; per-op support
checks still gate — these EPs gain no new kernels here).
**Fusion updates**
- `onnxruntime/core/optimizer/gather_fusion.cc` — GatherToSlice Range
version list `{1,11}` → `{1,11,27}`.
- `onnxruntime/core/optimizer/embed_layer_norm_fusion.cc` — add `27` to
the two Range path-matchers (`parent_path_3/4`) so embedding fusion
still matches opset-27 models.
- `onnxruntime/test/optimizer/graph_transform_test.cc` — new opset-27
GatherToSliceFusion test.
**Requirements (7 bumped)**
- All 7 CI `requirements.txt` → `onnx==1.22.0rc1` (rc1 wheel is on
PyPI). The 3 transformers pins remain frozen at `1.18.0` (unrelated to
this bump; intentionally untouched).
**Generated docs / test data**
- `js/web/docs/webgl-operators.md` — regenerated.
- `docs/OperatorKernels.md` — **surgical** edit: CPU EP **and** CUDA EP
Range rows (`27+` + `[11, 26]` continuation each); see caveats.
- `onnxruntime/test/testdata/onnx_backend_test_series_filters.jsonc` —
**comment-only**: documents why no opset-27 CPU exclusions are needed
(all opset-27 node tests pass via function expansion).
**Docs**
- `.agents/skills/onnx-opset-bump-checklist/SKILL.md` — new reusable
checklist skill distilled from this integration. Now also documents the
"bump **all** execution providers together" tradition (CPU + CUDA +
JS/DML assessment in one pass) so future opset bumps don't ship a
partial EP set.
---
### Validation (CPU EP + CUDA EP, Linux x64)
- Full build ✅
- `--minimal_build extended` build ✅ (validates the rebased
`ONNX_MINIMAL_BUILD` patch hunk independently of the vcpkg mirror path)
- `onnxruntime_test_all` ✅ — **1595 passed / 0 failed**
- `onnx_test_runner -e cpu` on the ONNX 1.22 opset-27 node tests ✅ —
**62/62 pass** via ONNX function-body expansion (run with
`ALLOW_RELEASED_ONNX_OPSET_ONLY=0`), including CausalConvWithState,
LinearAttention, and fp16/bf16 Range — despite no native kernels for
them.
- **CUDA EP (H100):** built `--use_cuda` clean in both **Debug** and
**RelWithDebInfo** ✅; `onnx_test_runner -e cuda` on the opset-27 Range
node tests ✅ — **8/8 pass**, with native Range-27 placed on
CUDAExecutionProvider (no CPU fallback) and fp16/bf16 covered via
function-body expansion.
---
### Standing caveats (please read before reviewing)
1. **CUDA EP now locally verified for Range; other GPU EPs/ops still
CI-only.** The CUDA EP was built and the opset-27 **Range** node tests
run locally on an H100 (8/8 pass). DML and the remaining GPU EPs/ops
were **not** exercised here. Function-body expansion is EP-agnostic, so
other opset-27 models are expected to run on those EPs too, but broader
GPU coverage remains a CI/follow-up item.
2. **`OperatorKernels.md` updated surgically** (CPU Range row only). A
CPU-only *full* regen would destructively wipe the CUDA/DML/other-EP
sections (the generator only emits rows for the EPs in the built
module). A correct multi-EP regen needs a build per EP and is a
follow-up.
3. **Opset 27 is "under development"** in ONNX's released-versions map.
ORT's load-time validation rejects opset-27 models unless
`ALLOW_RELEASED_ONNX_OPSET_ONLY=0` (ORT CI already sets this). The
opset-27 **schemas are always compiled in from the submodule**
regardless — this gate only affects model load-time acceptance, not
schema availability.
4. **EP `GetMaxSupportedOpSet` jumped 25 → 27** (skips 26). This is an
*upper* guard only; raising it merely lets opset-26/27 nodes reach the
per-op support checks that still gate correctness. No regression — it
also retroactively un-caps opset-26 for these EPs.
5. **iOS/macOS Xcode framework build is currently broken by an upstream
ONNX CMake regression** (the `onnx_core` OBJECT-library split in
onnx/onnx#7733 reintroduced the Xcode breakage originally fixed by
onnx/onnx#7515 for onnx/onnx#7514). This is **NOT** caused by this opset
bump. Tracked upstream at
[onnx/onnx#8053](https://github.com/onnx/onnx/issues/8053). Non-Xcode
builds (Linux/Windows/Android/WASM) and all CPU/CUDA validation are
unaffected. This resolves at the **Phase 2** formal `v1.22.0` re-pin
once ONNX ships the fix.
---
### Follow-ups (explicitly NOT in this PR)
- **GPU/multi-EP coverage:** run opset-27 CUDA/DML node tests;
regenerate `OperatorKernels.md` across all EPs.
- **JS EP Range** `[11, 26]` + `27` split (currently registered
open-ended at `11`; mirror the CPU/CUDA versioned split).
- **DML Range opset-27 assessment** (DML uses its own `REG_INFO`
registration system — assess whether an opset-27 entry is needed).
- **WebGPU EP Range** opset-27 split — `range.cc` registers `Range`
`.SinceVersion(11)` open-ended, so it already claims opset-27 Range;
only the new bf16 type is unsupported and falls back via the `T`
type-constraint (function expansion). Mirror the CPU/CUDA versioned
`[11, 26]` + `27` split.
- **Native kernels:** implement CPU (and EP) `CausalConvWithState` and
`LinearAttention` kernels, and a native fp16/bf16 + `stash_type`
Range-27 kernel (replace today's function-expansion path with efficient
kernels).
- **Phase 2 — formal `v1.22.0` re-pin:** re-pin
`deps.txt`/submodule/portfile/requirements to the released tag once ONNX
publishes it (currently blocked on ONNX tagging the release); upload the
tag tarball to the vcpkg mirror. **This also restores the iOS/macOS
Xcode framework build** once the upstream onnx OBJECT-library Xcode
regression (caveat 5) is resolved and re-pinned.
- **Tooling:** fix the pre-existing crash in
`find_optimizer_opset_version_updates_required.py` (placeholder `ver`
parsed as int) so it can be relied on for future bumps.
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>