onnxruntime
9d1492a4 - Add option to memory map .ORT model loads (#28164)

Commit
10 days ago
Add option to memory map .ORT model loads (#28164) Addressing issue #25524 (MS internal: 60577894) Today, the closest method callers have to loading models from a shared resource is by mapping the model themselves and using use_ort_model_bytes_directly - this puts the responsibility on the caller to ensure the validity of the mapping as well. These changes introduce use_memory_mapped_ort_model, a session option for using memory-mapped I/O to load ORT format models directly inside OnnxRuntime. The mapping in this case is owned by the InferenceSession. The changes to implement this are simple and minimal and use ORT's existing platform-agnostic memory mapping helpers, and if we choose to make this the default behavior could mean automatic memory savings for multi-process usage. ### Note about memory implications & sharing model bytes: The reality of this change is that using use_memory_mapped_ort_model alone doesn't have a long-running memory usage advantage because ORT will ultimately copy the model bytes from the mapped pages into Tensors. Using it in coordination with _session.use_ort_model_bytes_for_initializers_ ensures that that initializers point directly to the flatbuffer bytes and avoids the extra copy. This would be the expected usage for multi-process sharing of a single model. This introduces questions around what the default behavior should be - the changes I made in this PR are conservative and retain all existing defaults at this time. **Changes** - **onnxruntime_session_options_config_keys.h** — New session.use_memory_mapped_ort_model config key - **inference_session.h** — Added Env::MappedMemoryPtr member to hold the file mapping; updated existing comments to document the mmap path - **inference_session.cc** — New LoadOrtModelBytesMapped() static function; updated LoadOrtModel(PathString) to check config and use mmap; updated Initialize() cleanup to release the mapping; updated comment on initializer gating to note mmap case - **ort_model_only_test.cc** — Two new tests: LoadOrtFormatModelMemoryMapped and LoadOrtFormatModelMemoryMappedWithInitializersFromMap - Also checking in a benchmarking tool, benchmark_mmap_ort.py, just for preservation, but this is optional and can be omitted. - Added a flag to the perf tests used by the benchmark to hold onto the session for a specified amount of time - useful for measuring memory sharing changes. We can revert these and exclude the benchmark if they are not desired for check-in. ### **Benchmark Examples** Note that the benchmark is largely written by GHCP and may not be perfect, but I've validated some of its results. **Single-Proc** Here is a sample result from a single-process benchmark using resnet50 (converted to ORT format). Note that these measure peaks during construction and not end-states, and the measurements may be imperfect. `python tools/python/benchmark_mmap_ort.py --perf-test build\Windows\Release\Release\onnxruntime_perf_test.exe --model resnet50.ort --iterations 15` | Configuration | Session Creation (ms) | Peak Private Commit (MB) | Peak Working Set (MB) | Session vs baseline | Private vs baseline | |---|---|---|---|---|---| | .ort standard load (baseline) | 193.13 | 222.9 | 235.9 | — | — | | .ort memory-mapped load | 120.95 | 125.7 | 236.1 | **-37.4%** | **-43.6%** | | .ort mmap + direct initializers | 14.87 | 109.6 | 120.6 | **-92.3%** | **-50.8%** | **Multi-Proc** The multi-proc benchmark shows that total memory bandwidth gains for shared models can only be obtained alongside use_ort_model_bytes_for_initializers_ | Configuration (4 processes) | Total Private (MB) | Total Working Set (MB) | Private vs baseline | |---|---|---|---| | .ort standard load (baseline) | 462.6 | 519.0 | — | | .ort memory-mapped load | 462.1 | 518.5 | -0.1% | | .ort mmap + direct initializers | 98.2 | 187.8 | **-78.8%** | --------- Co-authored-by: Kevin Taha <kevintaha@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Author
Parents
Loading