Compute tile index using tile-based coordinates
This reduces the chances of overflowing a 32-bit integer when computing tile indices.
Add unit test to reproduce the overflow with the previous implementation of `blocked_fold_in`.
PiperOrigin-RevId: 737778853