[BOLT][AArch64] Optimize the mov-imm-to-reg operation (#189304)
On AArch64, logical immediate instructions are used to encode some
special immediate values. And even at `-O0` level, the AArch64 backend
would not choose to generate 4 instructions (movz, movk, movk, movk) for
moving such a special value to a 64-bit regiter.
For example, to move the 64-bit value `0x0001000100010001` to `x0`, the
AArch64 backend would not choose a 4-instruction-sequence like
```
movz x0, 0x0001
movk x0, 0x0001, lsl 16
movk x0, 0x0001, lsl 32
movk x0, 0x0001, lsl 48
```
Actually, the AArch64 backend would choose to generate one instruction
```
mov x0, 0x0001000100010001
```
which is essentially
```
orr x1, xzr, 0x0001000100010001
```
We could refer to `AArch64ExpandPseudoImpl::expandMOVImm` and
`AArch64_IMM::expandMOVImm` for related implementation.
Therefore, maybe we could consider to leverage `expandMOVImm` in llvm to
optimize the mov-imm-to-reg operation in BOLT, which would help to speed
up the BOLT-instrumented binary.