[Pallas:MGPU] Treat each warpgroup as a single logical thread.
As an extra minor change, we now disallow specifying the predicate when uniform is
unset, as that implies that we're going to use two different mechanisms to select
a single thread.
PiperOrigin-RevId: 689289365