[CIR][AArch64] Add lowering for unpredicated svdup builtins (#174433)
This PR adds CIR lowering support for unpredicated `svdup` SVE builtins.
The corresponding ACLE intrinsics are documented at:
* https://developer.arm.com/architectures/instruction-sets/intrinsics
(search for svdup).
Since LLVM provides a direct intrinsic for svdup with a 1:1 mapping, CIR
lowers these builtins by emitting a call to the corresponding LLVM
intrinsic.
DESIGN NOTES
------------
With this change, ACLE intrinsics that have a corresponding LLVM intrinsic can
generally be lowered by CIR by reusing LLVM intrinsic metadata, avoiding
duplicated intrinsic-name definitions, unless codegen-relevant SVETypeFlags are
involved. As a consequence, CIR may no longer emit NYI diagnostics for
intrinsics that (a) have a known LLVM intrinsic mapping and (b) do not use such
codegen-relevant `SVETypeFlag`s; these intrinsics are lowered directly.
IMPLEMENTATION NOTES
--------------------
* Intrinsic discovery logic mirrors the approach in
CodeGen/TargetBuiltins/ARM.cpp, but is simplified since CIR only
requires the intrinsic name.
* Test inputs are copied from the existing svdup tests:
tests/CodeGen/AArch64/sve-intrinsics/acle_sve_dup.c.
* The LLVM IR produced _with_ and _without_ `-fclangir` is identical,
modulo basic block labels, SROA, and function attributes.
EXAMPLE LOWERING
----------------
Input:
```C
svint8_t test_svdup_n_s8(int8_t op)
{
return svdup_n_s8(op);
}
```
OUTPUT 1 (default):
```llvm
define dso_local <vscale x 16 x i8> @test_svdup_n_s8(i8 noundef %op) #0 {
entry:
%op.addr = alloca i8, align 1
store i8 %op, ptr %op.addr, align 1
%0 = load i8, ptr %op.addr, align 1
%1 = call <vscale x 16 x i8> @llvm.aarch64.sve.dup.x.nxv16i8(i8 %0)
ret <vscale x 16 x i8> %1
}
```
OUTPUT 2 (via `-fclangir`):
```llvm
define dso_local <vscale x 16 x i8> @test_svdup_n_s8(i8 %0) #0 {
%2 = alloca i8, i64 1, align 1
%3 = alloca <vscale x 16 x i8>, i64 1, align 16
store i8 %0, ptr %2, align 1
%4 = load i8, ptr %2, align 1
%5 = call <vscale x 16 x i8> @llvm.aarch64.sve.dup.x.nxv16i8(i8 %4)
store <vscale x 16 x i8> %5, ptr %3, align 16
%6 = load <vscale x 16 x i8>, ptr %3, align 16
ret <vscale x 16 x i8> %6
}
```