[Pallas/Mosaic GPU] Enable lowering `semaphore_{read,signal,wait}_p` using warpgroup semantics.
This can basically reuse the code for lane semantics, given that none of the
involved variables need to go through layout inference.
PiperOrigin-RevId: 881460161