jax
3f7d9106 - [Mosaic GPU] Fix predicate for `tcgen05_mma` lowering.

Commit
82 days ago
[Mosaic GPU] Fix predicate for `tcgen05_mma` lowering. From https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-mma-instructions-mma: > The instruction `tcgen05.mma` has single thread semantics, unlike the collective instructions `mma.sync` or `wgmma.mma_async`. So, a single thread issuing the `tcgen05.mma` will result in the initiation of the whole matrix multiply and accumulate operation. This is consistent with LANE lowering semantics. PiperOrigin-RevId: 880831793
Author
Parents
Loading