S390x simd implementation (#25757)

Commit

67 days ago

S390x simd implementation (#25757) ### Description This change adds SIMD-optimized implementation of functions for s390x. This implementation is based on similar functions for ppc64le. #### Build System Integration (onnxruntime_mlas.cmake): * Adds a new S390X flag to the CMake build system to detect the target architecture. * Includes new source files specific to s390x (SgemmKernel.cpp, DgemmKernel.cpp, Quantize.cpp, qgemm_kernel_zvector.cpp, etc.). * Sets the necessary compiler flags (-mvx, -mzvector, -march=z15) to enable z/Vector extensions. #### Platform Abstraction (mlasi.h, platform.cpp): * Defines MLAS_TARGET_S390X and MLAS_ZVECTOR_INTRINSICS for conditional compilation. * Integrates the new s390x kernels into the MLAS_PLATFORM dispatch table. * platform.cpp now checks for z/Vector support at runtime using getauxval(AT_HWCAP) and HWCAP_S390_VXE, allowing it to fall back to scalar implementations if the hardware support is not present. #### New Kernel Implementations: * qgemm_kernel_zvector.cpp: Implements quantized integer matrix multiplication. This is the core of the performance improvement for quantized models. * SgemmKernelZVECTOR.cpp / DgemmKernelZVECTOR.h: Implements single and double-precision floating-point GEMM. * QuantizeZVECTOR.cpp / Quantize.cpp: Implements quantization and requantization kernels. * FgemmKernelZVECTOR.h: A generic header providing templates and macros for both single and double-precision GEMM, similar to the ppc64le implementation. ### Motivation and Context This change improves performance of onnxruntime on s390x.

References

#25757 - S390x simd implementation

Author

AlekseiNikiforovIBM

Parents

e0569fd9

onnxruntime 510dd14f - S390x simd implementation (#25757)

onnxruntime
510dd14f - S390x simd implementation (#25757)