jdk
2f9b3fba - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations

Commit
9 hours ago
8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. The performance of Vector API relative jmh micro benchmarks can improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 architecture with different UseSVE options. Here is the uplift details: ``` Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x LongMaxVector.SADD 1024 thrpt 30 82.01x 81.74x 82.153x LongMaxVector.SADDMasked 1024 thrpt 30 91.61x 92.69x 93.579x LongMaxVector.SSUB 1024 thrpt 30 81.97x 81.42x 82.991x LongMaxVector.SSUBMasked 1024 thrpt 30 91.34x 92.47x 93.026x LongMaxVector.SUADD 1024 thrpt 30 82.44x 81.29x 82.506x LongMaxVector.SUADDMasked 1024 thrpt 30 92.21x 92.35x 93.419x LongMaxVector.SUSUB 1024 thrpt 30 82.04x 80.98x 81.761x LongMaxVector.SUSUBMasked 1024 thrpt 30 91.74x 92.39x 93.375x LongMaxVector.UMAX 1024 thrpt 30 81.59x 80.21x 82.162x LongMaxVector.UMAXMasked 1024 thrpt 30 70.09x 92.89x 93.627x LongMaxVector.UMIN 1024 thrpt 30 82.31x 81.95x 82.298x LongMaxVector.UMINMasked 1024 thrpt 30 69.85x 92.19x 93.390x ShortMaxVector.SADD 1024 thrpt 30 80.08x 79.15x 80.310x ShortMaxVector.SADDMasked 1024 thrpt 30 90.74x 92.00x 93.743x ShortMaxVector.SSUB 1024 thrpt 30 79.54x 78.67x 80.584x ShortMaxVector.SSUBMasked 1024 thrpt 30 91.18x 92.10x 93.725x ShortMaxVector.SUADD 1024 thrpt 30 79.86x 79.37x 80.372x ShortMaxVector.SUADDMasked 1024 thrpt 30 90.17x 92.43x 93.759x ShortMaxVector.SUSUB 1024 thrpt 30 79.78x 79.85x 80.744x ShortMaxVector.SUSUBMasked 1024 thrpt 30 89.99x 91.91x 93.320x ShortMaxVector.UMAX 1024 thrpt 30 79.87x 79.81x 80.518x ShortMaxVector.UMAXMasked 1024 thrpt 30 89.69x 91.70x 92.826x ShortMaxVector.UMIN 1024 thrpt 30 79.11x 77.98x 79.458x ShortMaxVector.UMINMasked 1024 thrpt 30 90.49x 92.86x 93.323x ``` Tested with `hotspot::hotspot_all` and `jdk::jdk_all`, and no new regression is found. [1] https://github.com/openjdk/jdk/pull/20507
References
Author
Committer
Parents
Loading