[DFAJumpThreading] Rewrite the way paths are enumerated (#96127)
I tried to add a limit to number of blocks visited in the paths()
function but even with a very high limit the transformation coverage was
being reduced.
After looking at the code it seemed that the function was trying to
create paths of the form
`SwitchBB...DeterminatorBB...SwitchPredecessor`. This is inefficient
because a lot of nodes in those paths (nodes before DeterminatorBB)
would be irrelevant to the optimization. We only care about paths of the
form `DeterminatorBB_Pred DeterminatorBB...SwitchBB`. This weeds out a
lot of visited nodes.
In this patch I have added a hard limit to the number of nodes visited
and changed the algorithm for path calculation. Primarily I am
traversing the use-def chain for the PHI nodes that define the state. If
we have a hole in the use-def chain (no immediate predecessors) then I
call the paths() function.
I also had to the change the select instruction unfolding code to insert
redundant one input PHIs to allow the use of the use-def chain in
calculating the paths.
The test suite coverage with this patch (including a limit on nodes
visited) is as follows:
Geomean diff:
dfa-jump-threading.NumTransforms: +13.4%
dfa-jump-threading.NumCloned: +34.1%
dfa-jump-threading.NumPaths: -80.7%
Compile time effect vs baseline (pass enabled by default) is mostly
positive:
https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u
Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed