jax
b3948bfe - [Pallas:MGPU] Expose the optimized SMEM/GMEM copy layout

Commit

261 days ago

[Pallas:MGPU] Expose the optimized SMEM/GMEM copy layout We can implement synchronous SMEM/GMEM copies using regular loads/stores with `plgpu.layout_cast` to the right layouts. We could alternatively do it as a dedicated primitive that calls `mgpu.copy_tiled`, but the current way is more future-proof. Once we transition fully to the layout inference pass the layout cast should become unnecessary and the load/store should be the program we want to emit anyway. PiperOrigin-RevId: 794567794

References

#30953 - [Pallas:MGPU] Expose the optimized SMEM/GMEM copy layout

Author

apaszke

Committer

Google-ML-Automation

Parents

8f87c00c

jax b3948bfe - [Pallas:MGPU] Expose the optimized SMEM/GMEM copy layout

jax
b3948bfe - [Pallas:MGPU] Expose the optimized SMEM/GMEM copy layout