[Hexagon] Support partial reduction intrinsics (#179797)
This commit has changes necessary for using vrmpy instructions in full and partial multiply/add reductions on extended arguments. There are three main parts:
- partial reduction operations PARTIAL_REDUCE_(U|S|SU)MLA are lowered to accumulating vrmpy, including native and multiples of native vector sizes;
- full and partial reductions can be "split" into an inner partial reduction and a residual full or partial reduction. The inner reduction will be lowered to vrmpy due to the first change;
- vecreduce_add expansion is moved to Hexagon backend from a generic pass, accompanied by a set of tests.
In addition, there is a minor cleanup in HexagonTargetLowering::PerformDAGCombine().