jax
acc3a5b4 - [mgpu] Fix for mbarriers with `order_tensor_core=True`.

Commit
11 days ago
[mgpu] Fix for mbarriers with `order_tensor_core=True`. At least in theory (according to PTX docs), there is no guaranteed ordering between `bar.sync` instructions (used by warpgroup barriers) and tcgen05 instructions, so we need to issue the warpgroup barrier after the `::before_thread_sync` instruction. PiperOrigin-RevId: 881340629
Author
Parents
Loading