[X86] canonicalizeLaneShuffleWithRepeatedOps - avoid folding vperm2x128(vpshufd(load()),undef) -> vpshufd(vperm2x128(load(),undef)) (#178675)
There's no benefit to letting vperm2x128 handle the fold in an unary
shuffle and llvm-mca assumes there's an extra register dependency, which
confuses analysis.
Fixes #178632