8348868: AArch64: Add backend support for SelectFromTwoVector
This patch adds aarch64 backend support for SelectFromTwoVector
operation which was recently introduced in VectorAPI.
It implements this operation using a two table vector lookup instruction -
"tbl" which is available only in Neon and SVE2.
For 64-bit vector length : Neon tbl instruction is generated for T_SHORT
and T_BYTE types only.
For 128-bit vector length : Neon tbl instruction is generated if UseSVE <
2 and SVE2 "tbl" instruction is generated if UseSVE == 2.
For > 128-bit vector length : Currently there are no machines which have
vector length > 128-bit and support SVE2. For all those machines with vector
length > 128-bit and UseSVE < 2, this operation is not supported. The
inline expander for this operation would fail and lowered IR will be
generated which is a mix of two rearrange and one blend operation.
This patch also adds a boolean "need_load_shuffle" in the inline
expander for this operation to test if the platform requires
VectorLoadShuffle operation to be generated. Without this, the lowering
IR was not being generated on aarch64 and the performance was quite
poor.
Performance numbers with this patch on a 128-bit, SVE2 supporting
machine is shown below -
Benchmark (size) Mode Cnt Gain
SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43
SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48
SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55
SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07
SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69
SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52
SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50
SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52
SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38
SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93
SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48
SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49
Gain column refers to the ratio of thrpt between this patch and the
master branch after applying changes in the inline expander.