Improve performance of prune_graph in onnx_model.py (#17502)
During optimization of SDXL UNet, the prune_graph takes up to 5 minutes.
The cause is to find a node in all nodes is time-consuming. This
optimization will reduce the latency of prune_graph to 2 seconds.
New algorithm will use a hash table (key is first node output, value is
node) to speed up.