[Mosaic GPU] Introduce an optimization barrier op.
Also add layout inference and lowering rules for it. Its initial use case will
be to fence WGMMA accumulator registers. As a result, transform inference is
not immediately useful for this op, and we omit it here.
PiperOrigin-RevId: 738718000