Add packaging pipeline for CUDA plugin EP (#28152)
## Description
This PR adds an Azure Pipelines packaging flow for the CUDA plugin EP,
following the existing WebGPU plugin packaging pipeline pattern. The new
pipeline can package Windows x64 and Linux x64 builds for both CUDA 12.8
and 13.0, and optionally package Linux aarch64 builds when CUDA 13.0 is
selected.
The flow is parameterized for CUDA version, package version, build type,
and Python configuration so the packaging matrix can be expanded without
duplicating pipeline logic. It also adds validation to reject
unsupported combinations such as Linux aarch64 with CUDA 12.8.
## Summary of Changes
### Azure Pipelines packaging flow
| File | Change |
|---|---|
| `tools/ci_build/github/azure-pipelines/plugin-cuda-pipeline.yml` |
Adds the top-level official pipeline with CUDA 12.8/13.0 selection,
package/build-type validation, and aarch64 gating for CUDA 13.0 only. |
|
`tools/ci_build/github/azure-pipelines/stages/plugin-cuda-packaging-stage.yml`
| Adds the packaging orchestrator that fans out per-platform/per-Python
build stages and merges Linux artifacts. |
|
`tools/ci_build/github/azure-pipelines/stages/plugin-linux-cuda-stage.yml`
| Adds the Linux packaging stage template for x64 and aarch64,
parameterized by CUDA version, Python executable, Docker image, and CUDA
architectures. |
|
`tools/ci_build/github/azure-pipelines/stages/plugin-win-cuda-stage.yml`
| Adds the Windows packaging stage template with CUDA-version-specific
SDK setup and CUDA 13.0 cuDNN handling. |
### Linux build script
| File | Change |
|---|---|
| `tools/ci_build/github/linux/build_cuda_plugin_package.sh` | Adds a
Docker-based CUDA plugin packaging script with parameters for build
config, Python executable, CUDA version, and `CMAKE_CUDA_ARCHITECTURES`.
|
### Packaging behavior
- Supports `cuda_version` = `12.8` or `13.0`.
- Restricts Linux aarch64 packaging to CUDA 13.0 because the aarch64
CUDA Docker image is only available for CUDA 13.x.
- Uses CUDA-version-specific Docker base images and CUDA architecture
lists.
- Threads Python configuration through the Linux packaging path so
wheel-producing builds can be selected per Python version.
- Merges Linux per-version artifacts into a combined Linux artifact for
downstream consumption.
## Testing
- Not run locally. This change adds CI pipeline definitions and
packaging scripts only.
## Motivation and Context
The CUDA plugin EP already has GitHub Actions CI coverage for Linux and
Windows builds, but it did not yet have a matching Azure Pipelines
packaging flow like the WebGPU plugin EP. Adding this packaging pipeline
makes it possible to publish packaged CUDA plugin artifacts through the
same official packaging infrastructure, while also supporting the newer
CUDA 13.0 configuration and Linux aarch64 packaging where the required
Docker image exists.
## Checklist
- [x] Tests added/updated or not required
- [x] Documentation updated or not applicable
- [x] No breaking changes
- [ ] CI passes