[AArch64][clang] Fix `__arm_atomic_store_with_stshh` ordering and lowering
`__builtin_arm_atomic_store_with_stshh` must satisfy two constraints:
- preserve release/seq_cst ordering in LLVM IR
- keep `stshh` immediately adjacent to the final store in codegen
The original target-intrinsic lowering preserved the final `stshh` + store
sequence, but it did not model ordering strongly enough in LLVM IR, so the
optimizer could sink earlier stores across the builtin.
Fix this by inserting a `release` or `seq_cst` fence before the intrinsic
call. This preserves ordering in optimized IR while still letting the
backend emit the required final instruction sequence. This means we now
get a `dmb ish` instruction before the `stshh` instruction.
Also relax Sema for the builtin to accept storing 8/16/32/64-bit
floating-point and pointer values in addition to integers, and update
the diagnostic text accordingly.
Add coverage for:
- integer, floating-point, and pointer codegen cases
- Sema acceptance/rejection cases
- optimized-IR ordering regression coverage
- AArch64 assembly checks for the final release-store sequence