Enabling L2+ Optimizations for EPs (#23517)
There are some requirements to modify the graph which are specific to
the EP/hardware.
ORT has the hardcoded EP list for optimizations but that can't scale and
it's hard be extended to enable EP custom optimizations.
Here is the prototype to enable L2+ optimizations for EPs (The original
overview is provided by @skottmckay) as well as the TRT EP
implementation for the ConstantFoldingDQ optimization.
Signatures for selection and optimization functions:
````
- Selection: std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&, const KeyValueConfig&)>
- Optimization: std::function<Status(const Graph&, const ComputeCapability& this_optimization, ComputeCapability& cc_to_update)>
````
GetCapability
- call (new) provider bridge API to lookup pre-defined optimizer by name
and get selection function
- ComputeCapability.optimize_func, i.e. optimization function, would be
set by the optimizer to the function that does the optimization
- EP has to update the returning ComputeCapability to include the
optimization ComputeCapability in nodes_to_optimize. So that later ORT
can perform optimization/transformation accordingly.
GraphPartitioner
- After assigning the ComputeCapability to the EP and prior to Compile,
if the ComputeCapability has nodes_to_optimize, iterate that list
- optimization function needs to be called with
- a mutable Graph instance
- the ComputeCapability for the individual optimization
- the overall ComputeCapability so it can be updated