[Pallas:MGPU] Expose gathers in plgpu.copy_gmem_to_smem
To use them, load the indices in the appropriate layout and use their array
to index a reference:
```
idxs = plgpu.load(idx_ref, (), layout=plgpu.Layout.TMA_GATHER_INDICES)
plgpu.copy_gmem_to_smem(x_ref.at[idxs], y_ref, barrier_ref)
```
PiperOrigin-RevId: 796399052