[Mosaic GPU] Add support for arbitrary reductions of tiled layouts
This significantly generalizes our ability to perform reductions, to the
point where pretty much all tiled layouts can be handled out of the box.
The code is slightly longer than the few special cases we've implemented in
the past, but overall is much more general.
This also includes a hypothesis test that verifies that we always return the
right answers, even for randomly sampled layouts. The test could still be improved
in that we'll skip all the cases where we fail to synthesize a load/store for the
layout, but it's already caught a number of problems in the initial implementation.
PiperOrigin-RevId: 755303138