jax
b3948bfe - [Pallas:MGPU] Expose the optimized SMEM/GMEM copy layout

Commit
261 days ago
[Pallas:MGPU] Expose the optimized SMEM/GMEM copy layout We can implement synchronous SMEM/GMEM copies using regular loads/stores with `plgpu.layout_cast` to the right layouts. We could alternatively do it as a dedicated primitive that calls `mgpu.copy_tiled`, but the current way is more future-proof. Once we transition fully to the layout inference pass the layout cast should become unnecessary and the load/store should be the program we want to emit anyway. PiperOrigin-RevId: 794567794
Author
Parents
Loading