onnxruntime
fbba40a4 - Model Package Support (#27786)

Commit
3 days ago
Model Package Support (#27786) ### Description To support the model package design, one of the goals for ORT is to automatically select the most suitable compiled EPContext binary from a collection of precompiled variants based on the EP, provider options, metadata, and available devices. This PR is for ORT to support first phase model package. There could be other follow-up PRs in the future. A model package is a collection of models, binaries, and metadata files organized in a hierarchically structured directory. The directory structure is not yet finalized, so the following is just a simple example of a model package directory: ```` <model>.ortpackage/  ├── manifest.json ├── pipeline.json ├── configs/ | ├── genai_config.json | └── chat_template.jinja  └── models/      └── model_name/          ├── metadata.json | └── Contains general information on the component model, | and specific information about each model variant | such as data types, quantization algo, EP, etc. that | is updated on add/remove of model variant └── shared_weights/ (shared weights from all variants) └── <checksum of weights file A>/ └── model.data └── <checksum of weights file B>/ └── model.data └── ...         └── base model/                ├── model.onnx          └── variant A /              ├── optimized model.onnx (contains EPContext nodes)              └── [Compilation artifacts]          └── variant B /              ├── optimized model.onnx (contains EPContext nodes)              └── [Compilation artifacts]  ```` #### Spec and Format: See [here](https://github.com/microsoft/onnxruntime/blob/07e55627e75da24099c582331a0f786090e6382a/onnxruntime/core/session/model_package/README.md) #### Definitions: - Model Package - A model package defines the overall logical ‘model’ - A model package contains one or more ‘component models’ - Component Model - A component model comprises one or more ‘model variants’ - Model Variant - A ‘model variant’ is a single ONNX or ORT format model #### manifest.json and metadata.json A manifest.json may look like: ```` { "model_name": <logical_model_name>, "component_models": [ <component_model_name_1>, <component_model_name_2> ] } ```` A metadata.json for a component model may look like: ```` { "component_model_name": <component_model_name_1>, "model_variants": { <variant_name_1>: { "file": <ep_context_model_1 onnx file>, "constraints": { "ep": <ep_name>, "device": <device_type>, "architecture": <hardware_architecture> } }, <variant_name_2>: { "file": <ep_context_model_2 onnx file>, "constraints": { "ep": <ep_name>, "device": <device_type>, "architecture": <hardware_architecture> } } } } ```` #### Model Selection The selection logic is implemented in `MatchesVariant()`, which evaluates the following constraints: (Note: A constraint refers to a value under the "constraints" field in either manifest.json or metadata.json.) - Check ep constraint - Check device constraint - For some provider-bridge EPs, they may not implement `OrtEpFactory::GetSupportedDevices`, therefore ORT won't have the supported device information for those EPs. In that case, ORT will skip the device constraint validation for those EPs. - If provider option contains key related to device type, then the value must match the device constraint if any. - Check ep_compatibility_info constraint - ORT does not directly evaluate the architecture constraint. Instead, it relies on the ep_compatibility_info constraint, which may encode architecture information if needed. - The ep_compatibility_info value is expected to match the EP compatibility string stored in the EPContext model metadata. (See OrtEp::GetCompiledModelCompatibilityInfo() for how this string is generated.) - The EP implementation of EpFactory::ValidateCompiledModelCompatibilityInfo() is responsible for validating the compatibility string against the target device (i.e. OrtHardwareDevice) and returning the compatibility result. #### Note Check the unit test [here](https://github.com/microsoft/onnxruntime/pull/27786/changes#diff-bfa4122a85543ae2d80bf4cf6d9f85248e51c2276a5956af32f9bd8c8983d23a) to better understand how to use model package. #### Code Change This pull request introduces significant enhancements to the execution provider (EP) selection and management infrastructure in ONNX Runtime. The main focus is on supporting more sophisticated device selection and manifest-based model packaging, as well as refactoring provider selection logic for modularity and future extensibility. Key changes include: - Introduction of model package context and manifest parsing to support selecting model components based on device and EP constraints. - Refactoring of the execution provider interface and related classes to support multiple devices per provider. - Modularization of EP/device selection, creation, and registration logic in the provider policy context. The most important changes are: **Model Package Context and Manifest Support** - Added new files `model_package_context.h` and `model_package_context.cc` to implement manifest parsing, device/EP constraint matching, and component selection logic for model packages. This enables ONNX Runtime to select the most appropriate model variant based on available hardware and EP configuration. [[1]](diffhunk://#diff-006078879d52b421c973e2880c65db474aad6b21ad81ba69d387df8661bafeb2R1-R78) [[2]](diffhunk://#diff-45c29f481077e424c8969dc2198a8b40ab5908cf3b0bbf25dbeaca3ec51935d5R1-R279) **Execution Provider Interface Enhancements** - Updated the `IExecutionProvider` class to support construction with a list of `OrtEpDevice` pointers, and added a `GetEpDevices()` method to retrieve the supported devices. This allows plugin and bridge EPs to expose multiple devices. [[1]](diffhunk://#diff-e15769e35b807986b812aae3ff7192269e171c5846b2ff4d8ec571ec8ed57aa4R87-R104) [[2]](diffhunk://#diff-e15769e35b807986b812aae3ff7192269e171c5846b2ff4d8ec571ec8ed57aa4R203-R207) - Updated plugin EP construction to pass the list of supported devices to the base class. **Provider Policy Context Refactoring** - Refactored provider policy context logic to modularize device ordering, device selection, telemetry logging, EP creation, and registration. This includes splitting the monolithic `SelectEpsForSession` into smaller methods: `OrderDevices`, `SelectEpDevices`, `LogTelemetry`, `CreateExecutionProviders`, `RegisterExecutionProviders`, and a new flow for model package-based EP selection. [[1]](diffhunk://#diff-dd9f398bec3f054aed2c930af620e3e1bfcc5b4a5d5667c4b0cd1f60ddfffda0R53-R58) [[2]](diffhunk://#diff-dd9f398bec3f054aed2c930af620e3e1bfcc5b4a5d5667c4b0cd1f60ddfffda0L118-L156) [[3]](diffhunk://#diff-dd9f398bec3f054aed2c930af620e3e1bfcc5b4a5d5667c4b0cd1f60ddfffda0L225-R199) [[4]](diffhunk://#diff-dd9f398bec3f054aed2c930af620e3e1bfcc5b4a5d5667c4b0cd1f60ddfffda0R254-R365) These changes collectively lay the groundwork for more flexible, robust, and extensible device and EP selection in ONNX Runtime, especially in scenarios involving packaged models with multiple variants and complex hardware environments. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Author
Parents
Loading