[ADT] Refactor post order traversal (#191047)
Currently, po_iterator holds the traversal state. This makes copying
and moving po_iterator fairly expensive and the code cannot be optimized
away in several cases (most of it isn't even inlined in a default
build).
Therefore, refactor post-order traversal to hold the state in a wrapper
class with cheap iterators. Additionally, replace po_storage base class
with a CRTP implementation where users can provide their own storage.
Benefits:
- Performance in stage2-O3 improves by 0.19% instructions:u and even
more substantially in cycles/wall-time.
- Users that use a custom storage/iteration limitation can do so in a
more clean way by subclassing PostIteratorTraversalBase. See e.g.
LoopBlocksTraversal.
- For graphs with block numbers, reserving can now be implemented
reasonably easy (not done yet).
Implications:
- PostOrderTraversal::iterator is no longer a forward iterator. This
property was never really used, though.
- PostOrderTraversal must be live while iterators are live. For typical
uses (for (X x : post_order(...))), this is no problem, but could end
up being problematic if the iterator is wrapped (e.g.
for (X x : make_filter_range(post_order(...), ...)) -- problematic,
because make_filter_range doesn't preserve the range but only the two
iterators, which become invalid as the for loop is entered). This is a
limitation of the way LLVM implements ranges.