[X86] combineX86ShuffleChain - prefer combining to X86ISD::SHUF128 if PERMQ operands are splittable (#133900)
If the 512-bit unary shuffle is a concatenation of 128/256-bit subvectors then we're better off using a X86ISD::SHUF128 node so we can fold the concatenation into the shuffle as well.