pytorch
26301447 - Call to mkldnn_matmul from aten::addmm on AArch64 (#91763)

Commit

2 years ago

Call to mkldnn_matmul from aten::addmm on AArch64 (#91763) We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/91763 Approved by: https://github.com/jgong5, https://github.com/ngimel, https://github.com/malfet

Author

milpuz01

Committer

pytorchmergebot

Parents

57c6f3fe

Files7

BUILD.bazel
aten/src/ATen
- Config.h.in
- native
  - LinearAlgebra.cpp
  - mkldnn
    - Matmul.cpp
- test
  - verify_api_visibility.cpp
buckbuild.bzl
cmake
- Dependencies.cmake

pytorch 26301447 - Call to mkldnn_matmul from aten::addmm on AArch64 (#91763)

pytorch
26301447 - Call to mkldnn_matmul from aten::addmm on AArch64 (#91763)