[Pallas/Mosaic GPU] Add the `reduction_scratch_bytes` field to `CompilerParams`.
This field allows configuring the number of bytes to reserve in order to
perform cross-warp reductions. The more bytes can be allocated to such a
reduction, the more registers can be reduced in parallel---yielding faster
reductions.
PiperOrigin-RevId: 860115656