[AArch64] Improve getPartialReductionCost for fixed-width VFs (#126538)
NEON does not have a version of udot/sdot that accumulates into
64-bit integer values, so we should return Invalid from
getPartialReductionCost for 64-bit types and fixed-width VFs.
In theory, if the 64-bit versions of SVE udot/sdot are available
we could use those, but we don't currently have lowering support
for that.