onnxruntime
f4bdbb8d - Annotation based partitioning along with resource accounting (#27595)

Commit
28 days ago
Annotation based partitioning along with resource accounting (#27595) This pull request introduces support for node "layering annotations" and improves resource accounting and memory management during graph partitioning in ONNX Runtime. The changes add new mechanisms for annotating nodes, filtering nodes by annotation during partitioning, and efficiently accounting for resources in fused nodes. Several APIs are extended to support these features, and new configuration options are introduced to guide layer assignment. **Layering annotations & partitioning:** * Added `layering_annotation_` member and associated getter/setter/clear methods to the `Node` class, allowing nodes to be annotated for layer assignment. Also added a method to clear these annotations after partitioning to save memory. (`include/onnxruntime/core/graph/graph.h`) [[1]](diffhunk://#diff-aaea1507ec81a94c72a1fa72ce320df712156b665f7798573be3f7e439bb4c37R177-R184) [[2]](diffhunk://#diff-aaea1507ec81a94c72a1fa72ce320df712156b665f7798573be3f7e439bb4c37R266-R272) [[3]](diffhunk://#diff-aaea1507ec81a94c72a1fa72ce320df712156b665f7798573be3f7e439bb4c37R702-R703) * Extended the graph partitioning logic to support filtering nodes by their layering annotation using a `LayeringIndex`, ensuring only nodes matching the current execution provider's assignment are considered during partitioning. (`onnxruntime/core/framework/graph_partitioner.cc`) [[1]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bR155) [[2]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bR199-R286) [[3]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL244-R357) [[4]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL433-R545) [[5]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL451-R564) [[6]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL477-R591) * Added a new session option `kOrtSessionOptionsLayerAssignmentSettings` to configure layer assignment using annotation prefixes per device. (`include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h`) **Resource accounting improvements:** * Improved the `IResourceAccountant` interface to allow resetting and committing pending weights per node, and updated resource accounting logic to correctly sum and commit costs for all constituent nodes in fused nodes, preventing double-counting or undercounting. (`include/onnxruntime/core/framework/resource_accountant.h`, `include/onnxruntime/core/graph/indexed_sub_graph.h`, `onnxruntime/core/framework/graph_partitioner.cc`) [[1]](diffhunk://#diff-7b1c9ef14536f9a66ed370cb729b6609d12c5907b460d8f145a7ad5a401e0fb6L48-R72) [[2]](diffhunk://#diff-3f09a80586759ee33e272477c3eb96f28d9b37f1e8251d13f1211c0450945135L89-R114) [[3]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL391-L397) **API and code organization:** * Updated the `Graph` class and related APIs to propagate layering annotations during function inlining and to provide a method for removing all layering annotations after partitioning. (`include/onnxruntime/core/graph/graph.h`) [[1]](diffhunk://#diff-aaea1507ec81a94c72a1fa72ce320df712156b665f7798573be3f7e439bb4c37R1341-R1346) [[2]](diffhunk://#diff-aaea1507ec81a94c72a1fa72ce320df712156b665f7798573be3f7e439bb4c37R1590-R1594) * Moved the `CreateAccountants` function out of the `NodeStatsRecorder` class to the namespace level for clarity. (`include/onnxruntime/core/framework/resource_accountant.h`) These changes enable more flexible and memory-efficient graph partitioning, particularly for scenarios involving hardware-specific layer assignments and dynamic resource constraints. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Author
Parents
Loading