[X86] lowerShuffleAsLanePermuteAndPermute - simplify lane crossing mask based on demanded elts
Don't demand every element of each demanded sublane - set the undemanded mask elements to UNDEF to allow simplification (usually to a VBROADCAST).
Fixes #66150