Annotation based partitioning along with resource accounting (#27595)
This pull request introduces support for node "layering annotations" and
improves resource accounting and memory management during graph
partitioning in ONNX Runtime. The changes add new mechanisms for
annotating nodes, filtering nodes by annotation during partitioning, and
efficiently accounting for resources in fused nodes. Several APIs are
extended to support these features, and new configuration options are
introduced to guide layer assignment.
**Layering annotations & partitioning:**
* Added `layering_annotation_` member and associated getter/setter/clear
methods to the `Node` class, allowing nodes to be annotated for layer
assignment. Also added a method to clear these annotations after
partitioning to save memory. (`include/onnxruntime/core/graph/graph.h`)
[[1]](diffhunk://#diff-aaea1507ec81a94c72a1fa72ce320df712156b665f7798573be3f7e439bb4c37R177-R184)
[[2]](diffhunk://#diff-aaea1507ec81a94c72a1fa72ce320df712156b665f7798573be3f7e439bb4c37R266-R272)
[[3]](diffhunk://#diff-aaea1507ec81a94c72a1fa72ce320df712156b665f7798573be3f7e439bb4c37R702-R703)
* Extended the graph partitioning logic to support filtering nodes by
their layering annotation using a `LayeringIndex`, ensuring only nodes
matching the current execution provider's assignment are considered
during partitioning. (`onnxruntime/core/framework/graph_partitioner.cc`)
[[1]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bR155)
[[2]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bR199-R286)
[[3]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL244-R357)
[[4]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL433-R545)
[[5]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL451-R564)
[[6]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL477-R591)
* Added a new session option `kOrtSessionOptionsLayerAssignmentSettings`
to configure layer assignment using annotation prefixes per device.
(`include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h`)
**Resource accounting improvements:**
* Improved the `IResourceAccountant` interface to allow resetting and
committing pending weights per node, and updated resource accounting
logic to correctly sum and commit costs for all constituent nodes in
fused nodes, preventing double-counting or undercounting.
(`include/onnxruntime/core/framework/resource_accountant.h`,
`include/onnxruntime/core/graph/indexed_sub_graph.h`,
`onnxruntime/core/framework/graph_partitioner.cc`)
[[1]](diffhunk://#diff-7b1c9ef14536f9a66ed370cb729b6609d12c5907b460d8f145a7ad5a401e0fb6L48-R72)
[[2]](diffhunk://#diff-3f09a80586759ee33e272477c3eb96f28d9b37f1e8251d13f1211c0450945135L89-R114)
[[3]](diffhunk://#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL391-L397)
**API and code organization:**
* Updated the `Graph` class and related APIs to propagate layering
annotations during function inlining and to provide a method for
removing all layering annotations after partitioning.
(`include/onnxruntime/core/graph/graph.h`)
[[1]](diffhunk://#diff-aaea1507ec81a94c72a1fa72ce320df712156b665f7798573be3f7e439bb4c37R1341-R1346)
[[2]](diffhunk://#diff-aaea1507ec81a94c72a1fa72ce320df712156b665f7798573be3f7e439bb4c37R1590-R1594)
* Moved the `CreateAccountants` function out of the `NodeStatsRecorder`
class to the namespace level for clarity.
(`include/onnxruntime/core/framework/resource_accountant.h`)
These changes enable more flexible and memory-efficient graph
partitioning, particularly for scenarios involving hardware-specific
layer assignments and dynamic resource constraints.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>