VitisAI EP Context Model (#20926)
# Why so many commits
- Runtime debugging - which is necessary
- Three different approaches to EP context model - as a result testing back and forth
- Windows compatibility issues - this development has been done on Linux for convenience
# "Open" (?) questions
- Full offloading to a specific EP
- Dumping EP context models by EPs vs [by
ONNXRT](https://github.com/microsoft/onnxruntime/blob/e2abba18ea9370329ce6894a4eb3e98ad8f11cb6/onnxruntime/core/framework/graph_partitioner.cc#L725)
- [Node name to pick
nodes](https://github.com/microsoft/onnxruntime/blob/e2abba18ea9370329ce6894a4eb3e98ad8f11cb6/onnxruntime/core/framework/graph_partitioner.cc#L654)
# VitisAI EP made three variant implementations that have respective pros and cons (and of course we can combine them)
## Serialize and cache the list of compute capabilities and the original
ONNX model itself
## In `ComputeCapability()`, serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key
## In `Compile()`, serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key
# EP context model creation
- Precondition
Session option configuration `kOrtSessionOptionEpContextEnable` (aka "ep.context_enable") is enabled.
- Approach 1
- Steps
1. EP creates an ONNX model whose main graph has EP context nodes (i.e., node type is "EPContext").
2. EP implements/overrides `IExecutionProvider::GetEpContextNodes()` method.
3. ONNXRT core creates an EP context model and saves/dumps it.
- `CreateEpContextModel()` in the file "graph_partitioner.cc"
- In `get_ep_context_node()`, `Node::Name()` is used to check whether a node is an EP context node. This limits that EP model creation can only happen in `IExecutionProvider::Compile()`.
- The workaround is (1) not implementing `IExecutionProvider::GetEpContextNodes()` and (2) dumping the EP context model by EP itself.
4. Optionally, EP can also dump the EP context model it created by
iteself.
- Examples
- `QNNExecutionProvider`
- `VitisAIExecutionProvider`
- Approach 2
- Steps
1. EP creates an ONNX model whose main graph has EP context nodes (i.e., node type is "EPContext").
2. EP does NOT implement `IExecutionProvider::GetEpContextNodes()` at all.
3. EP dumps the EP context model it created.
- Examples
- `TensorrtExecutionProvider`
- UPDATES
- TRT EP is switching to leveraging
`IExecutionProvider::GetEpContextNodes()`
- `OpenVINOExecutionProvider` (?)
# What to cache in EP context nodes
- Non Compilation based EPs
- Examples
- `VitisAIExecutionProvider`
- Characteristics
- Heavy lifting work happens in `IExecutionProvider::GetCapability()`.
- Preconditions
- `IExecutionProvider::GetCapability()` is only called once by ONNXRT.
- Cache content
- Serialization of a list of `ComputeCapability`
- Not EP-specific
- Serialized using `onnx::FunctionProto`
- EP-specific cache
- Compilation based EPs
- Examples
- `QNNExecutionProvider`
- `TensorrtExecutionProvider`
- `MIGraphXExecutionProvider`
- `OpenVINOExecutionProvider`
- Cache content
- EP-specific cache
# Requirements
- Offline / AOT compilation of ONNX models with EP context cache
- Compile somewhere, run everywhere
- Pseudo code with brief explanation
```
GenerateCache(original_onnx_file, cache_onnx_file) model_buffer = load(original_onnx_file) --> Load the original ONNX model file
model_buffer = decrypt(model_buffer)
session_options = { kOrtSessionOptionEpContextEnable: true,
kOrtSessionOptionEpContextFilePath: temp_file } --> Set necessary configs
Ort::CreateSessionFromArray(model_buffer, session_options) --> The new ONNX model with EP context is created and dumped into the user specified file "temp_file"
temp_buffer = encrypt(temp_file)
write(temp_buffer, cache_onnx_file) --> Write the encypted context of "temp_file" into the "cache_onnx_file" file
InitializeInferenceSession(cache_onnx_file)
model_buffer = load(cache_onnx_file) --> Load the ONNX model with EP context from the file generated in the previous step
model_buffer = decrypt(model_buffer)
session_options = { }
Ort::CreateSessionFromArray(model_buffer, session_options) --> Create and initalize an session with the EP context model
```
- Python code with comments
- EP context model creation
```python
import onnxruntime as onnxrt
# Session options for creating an ONNX model with EP context cache.
sess_opts = onnxrt.SessionOptions()
# Verbose.
sess_opts.log_severity_level = 0
# This is REQUIRED.
sess_opts.add_session_config_entry("ep.context_enable", "1")
# This is OPTIONAL.
# Either an absolute path (preferred for now) or a relative path (WIP)
is okay.
# sess_opts.add_session_config_entry("ep.context_file_path",
"/some/path/to/original_model_ctx.onnx")
# This is OPTIONAL.
sess_opts.add_session_config_entry("ep.context_embed_mode", "1")
orig_model_location = "/some/path/to/original_model.onnx"
sess = onnxrt.InferenceSession(orig_model_location, sess_opts,
providers=["VitisAIExecutionProvider"], provider_options=[])
```
- Inference run with an EP context model
```python
import onnxruntime as onnxrt
# Session options for creating an ONNX model with EP context cache.
sess_opts = onnxrt.SessionOptions()
# Default EP context model path.
# ep_ctx_model_location = "/some/path/to/origina_model.onnx_ctx.onnx"
# User configured EP context model path.
ep_ctx_model_location = "/some/path/to/origina_model_ctx.onnx"
sess = onnxrt.InferenceSession(ep_ctx_model_location, sess_opts,
providers=["VitisAIExecutionProvider"], provider_options=[])
model_inputs = {}
run_opts = onnxrt.RunOptions()
# Verbose.
run_opts.log_severity_level = 1
sess.run(None, model_inputs, run_opts)
```
---------
Co-authored-by: Glen Cao <glen@Glens-MacBook-Air.local>