[AArch64] More accurately model cost of partial reductions (#181707)
With #181706 using the cost-model to decide whether using partial
reductions is profitable, we need to more accurately represent the cost
of certain partial reduction operations:
* Reflect the fact that *MLALB/T instructions can be used for 16-bit ->
32-bit partial reductions (or *MLAL/MLAL2 for NEON).
* Calculate the cost of expanding the partial reduction in ISel for
reductions that don't have an explicit instruction, rather than
returning a random number. For sub-reductions we scale the cost to make
them slightly cheaper, so that they're still candidates for forming cdot
operations.