onnxruntime
77da2ef2 - [aarch64] Add Sbgemm kernel to accelerate fp32 tensor matmul with bfloat16 (#17031)

Commit

2 years ago

[aarch64] Add Sbgemm kernel to accelerate fp32 tensor matmul with bfloat16 (#17031) ### Description This PR adds SbgemmKernel for aarch64. This includes Sbegmm kernel to implement matrix multiplication with bfloat16 SIMD instructions (bfmmla) and MatMul operator changes to invoke the Sbgemm kernel. To enable Sbgemm kernel, set the following session option: "kOrtSessionOptionsGemmFastMathMode" The PR also adds new test cases for mlas and ort. ### Motivation and Context This is to improve MatMul performance on aarch64 platform. I have run the below benchmarking script (bert , roberta and gpt2 model inference) on AWS Graviton3 based c7g.4xl instance and observed 1.2x -1.76x performance improvement compared to sgemm (fp32) kernel performance. ``` cd onnxruntime/python/tools/transformers python3 benchmark.py ``` And the unit test precision results are matching to sgemm kernel results. `./build.sh --config RelWithDebInfo --build_shared_lib --parallel --compile_no_warning_as_error --skip_submodule_sync `

References

#17031 - [aarch64] Add Sbgemm kernel to accelerate fp32 tensor matmul with bfloat16

Author

snadampal

Parents

780acda7

onnxruntime 77da2ef2 - [aarch64] Add Sbgemm kernel to accelerate fp32 tensor matmul with bfloat16 (#17031)

onnxruntime
77da2ef2 - [aarch64] Add Sbgemm kernel to accelerate fp32 tensor matmul with bfloat16 (#17031)