[AArch64] Improve codegen for some fixed-width partial reductions (#126529)
This patch teaches optimizeExtendOrTruncateConversion to bail out
if the user of a zero-extend is a partial reduction intrinsic
that we know will get lowered efficiently to a udot instruction.