jax
cdae5fcf - [Mosaic GPU] Make sure to do the async proxy fence before wargroup sync

Commit
320 days ago
[Mosaic GPU] Make sure to do the async proxy fence before wargroup sync This is the ordering we want for a proper release of generic SMEM stores into the async proxy. The old order was problematic: once the warpgroup barrier was complete, some warps could get deselected before they get to the fence. For as long as the first warp would make progress, it could go through the fence along and start issuing TMA copies before other warps have synchronized with the async proxy. I have not observed this problem in any of our kernels so far, but this order seems safer to me. PiperOrigin-RevId: 733333814
Author
Parents
Loading