[Pallas:MGPU] Tile the minor dimensions in the reduce_scatter kernel too
Unrolling along them is a bad idea if they are big. Instead of adding another tile
size parameter, we infer the major/minor tile sizes by splitting the user-defined tile
size in elements.
PiperOrigin-RevId: 818651711