[AArch64] Optimize vector fmul(sitofp/uitofp, 1/2^N) -> scvtf/ucvtf (#141480)
When a vector integer-to-float conversion is followed by a multiply with a
reciprocal power-of-two constant, we can fold both operations into a single
SCVTF or UCVTF instruction with a fixed-point shift operand.
For example, `fmul(sitofp(v2i32 x), <0.5, 0.5>)` becomes `scvtf.2s v0, v0, #1`.
This is a reworked version with several improvements over the original
submission:
- Rewrite the C++ operand matcher to share implementation with the existing
`SelectCVTFixedPointVec` (MOVIshift, FMOV, and DUP handling with correct
truncation for f16)
- Add `uitofp`/`ucvtf` patterns via a `CVTFRecipPat` multiclass
- Add full GlobalISel support (`GIComplexOperandMatcher` + renderer)
Supported vector types: `v2f32`, `v4f32`, `v2f64`, `v4f16`, `v8f16`.
Fixes #94909