Handle edge case with implicit input and multiple levels of subgraphs (#4031)
* Handle edge case where an implicit input for a subgraph may not get wired in correctly.
Conditions required:
- two or more levels of nested subgraph
- an implicit input from above the bottom two levels is used in both levels of subgraph
- this creates a NodeArg for the implicit input at both levels
- something changes to the first level subgraph to no longer use the implicit input
- could be constant folding, could be partitioning of nodes results in a copy of the implicit input being made to a different device
When that occurs we lose the wiring through to the second level of nested subgraph as there's a NodeArg in the first level but the implicit input is no longer used there. Fix that by doing a final check for outer scope values once we know all the outputs produced by the current graph.
Found by commenting out the CUDA implementations of the control flow nodes and running ssd_mobilenet_300 from the mlperf models.
* Add test case.