[llvm-mca][AArch64] Merge Neoverse NEON tests (NFC) (#170881)
Follow-on from #170324 to also refactor the NEON tests to reuse the
input assembly across all Neoverse cores.
The approach is as follows:
- Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so
first combine those.
- Inputs for V2/V3/V3AE NEON tests are also already identical, but
differ from N-cores, so combine those separately.
- Most significantly, input for V1 differs from all other cores
primarily because of 24f0901 (#128892).
- Split out features that are not supported across all cores.
- Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this
feature but all other Neoverse cores do. Also adds coverage for
N2/N3 since they were missing tests.
- Split out FEAT_BF16. V1 doesn't have this feature but all other
Neoverse cores do. Also adds coverage for N1/N2/N3 since they were
missing tests.
- Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other
Neoverse cores do. Also adds coverage for N2/N3 since they were
missing tests.
- Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing
N[1-3] against V[2-3] the only change the N cores have that V[2-3]
dont is:
```
< st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5
---
> st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5
```
So we take it for all cores. The rest of the diff is
instructions in V[2-3] that arent in N cores, so we also take them.
All Neoverse cores can optionally support the Cryptographic Extension.
The related features (AES, ...) are enabled by default for V1/N1 but not
the other cores, so need to be explicitly enabled via -mattr.
- Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3
- loads/stores are blended
- duplicates with different spaces like `shll v0.2d, v0.2s, #32` are
removed
- the rest of the diff is instructions in V1 that are not tested in the
other cores, so we add them for the other cores