MLAS: quantized GEMM update (#6916)

Commit

4 years ago

MLAS: quantized GEMM update (#6916) Various updates to the int8_t GEMMs: 1) Add ARM64 udot kernel to take advantage of dot product instructions available in newer cores. Some models run 4x faster than the stock implementation we used before. 2) Refactor the x64 kernels to share common code for AVX2(u8u8/u8s8/avxvnni) vs AVX512(u8u8/u8s8/avx512vnni) to reduce binary size. 3) Extend kernels to support per-column zero points for matrix B. This is not currently wired to an operator.

References

#6916 - MLAS: quantized GEMM update

Author

tracysh

Parents

bc319bd7

onnxruntime a8b897f7 - MLAS: quantized GEMM update (#6916)

onnxruntime
a8b897f7 - MLAS: quantized GEMM update (#6916)