Capacity aware partitioning (#22766)
### Description
Allow users to specify per EP specific resource constraints.
Currently, models that do not fit into device memory error out.
This PR lays groundwork for EP specific resource constrained graph
partitioning, subject to incremental feature additions.
Partitioning in this context means to assign graph nodes to a specific
device (Execution Provider)
up to a certain limit that is every automatically inferred or provided
by configuration.
In this implementation, we stop assigning nodes to CUDA once we reach
the specified memory limit.
This allows users to run models on devices with limited memory or other
limited resources and
offload parts of the graph on CPU or other EPs as configured.
The PR also introduces an ability to profile and save resource
consumption on a per node basis.
The results of one or more runs are saved into a CSV file which can then
be loaded to assist
partitioning.
Model architecture-based partitioning (like put N transformer blocks on
GPU and embedding on CPU) is not implemented in this PR but will be
coming in the future.
### Motivation and Context
We want to allow models to run in constrained environments.
### Pending
Annotation assisted partitioning