[Mosaic GPU] Adding a new layout WGMMAColFragLayout to be able to load a 1d array and broadcast it along the leading dimension to a 2d shape as an input to a wgmma.
In this new layout the first 4 threads of a warp group hold 8 uniques values. These values are replicated in each (thread_idx % 4) group.
PiperOrigin-RevId: 740058172