[Mosaic GPU] Only split collective TMAs only (multiple) major dimensions
Each TMA only writes to a contiguous subset of SMEM, so skipping a major
dimension while splitting results in incorrect code. To work around the
loss of flexibility, we now allow splitting multiple leading dimensions
to handle larger clusters and tiled references.
PiperOrigin-RevId: 655700486