[MLIR][XeGPU][GPU] Optimize GPU to XeVM pipeline (#184711)
Some XeGPU transforms can generate code sequences that can simplified by
folding. But full canonicalization is not required. As an alternative,
remove canonicalize from some parts of the pipeline where only folding
is needed and add folding at the end of XeGPU blocking pass
and XeGPU peephole optimize pass.