[Pallas][Mosaic GPU] Rename for_tensor_core to orders_tensor_core
This name better reflects the difference in barrier semantics this flag
causes. Unless set, nothing should be assumed about the relative ordering
of tcgen05 ops and barriers.
In particular, even if you await the completion of a tcgen05 op (e.g. a load)
in one thread and signal another, when it completes its wait, you can't assume
that the load really has been performed in its entirety unless the barrier
you've used to synchronize those two threads has orders_tensor_core=True.
To me this is a big usability issue in the PTX design. It's unsafe by default,
and requires us to insert additional fences to indicate which synchronization
primitives interact with the TensorCore.
PiperOrigin-RevId: 778030790