[mlir][SCF] Fix region branch op interfaces for `scf.forall` and its terminator (#174221)
`scf.forall` does not completely implement the
`RegionBranchOpInterface`: `scf.forall.in_parallel` does not implement
the `RegionBranchTerminatorOpInterface`.
Incomplete interface implementation is a problem for transformations
that try to understand the control flow by querying the
`RegionBranchOpInterface`.
Detailed explanation of what is wrong with the current implementation.
- There is exactly one region branch point: "parent". `in_parallel` is
not a region branch point because it does not implement the
`RegionBranchTerminatorOpInterface`. (Clarified in #174978.)
- `ForallOp::getSuccessorRegions(parent)` returns one region successors:
the region of the `scf.forall` op.
- Since there is no region branch point in the region, there is no way
to leave the region. This means: once you enter the region, you are
stuck in it indefinitely. (It is unspecified what happens once you are
in the region, but we can be sure that you cannot leave it.)
This commit adds the `RegionBranchTerminatorOpInterface` (via
`ReturnLike`) to `scf.forall.in_parallel` to indicate the after leaving
the region, the control flow returns to the parent. (Note: Only block
terminators in directly nested regions can be region branch terminators,
so `in_parallel` is the only possible op. I.e., `parallel_insert_slice`
cannot be a region branch terminator.)
This commit also removes all successor operands / inputs from the
implementation. The number of successor operands and successor inputs
must match, but `scf.forall.in_parallel` has no operands. Therefore, the
region must also have 0 successor inputs. Therefore, the `scf.forall` op
must also have 0 successor operands.
This commit also adds a missing control flow edge from "parent" to
"parent": in case of 0 threads, the region is not entered.
Depends on #174978.