[pallas:mosaic_gpu] Ported two pipelining optimizations to `emit_pipeline`
* Skip SMEM->GMEM copy if the destination buffer is being revisited
* Skip SMEM->GMEM copy if the corresponding index map does not use grid indices
PiperOrigin-RevId: 696448043