[AArch64][PAC] Reduce the size of synchronous CFI (#96377)
For synchronous unwind tables, the call frame information can be
slightly reduced by bundling the `.cfi_negate_ra_state` instruction with
other CFI instructions in the prolog, saving 1 byte per function used
for `DW_CFA_advance_loc`.
This was suggested in
[D156428](https://reviews.llvm.org/D156428#4554317).