pytorch
e21f648c - improve mkldnn matmul performance when one input is contiguous tensor but the strides is not default contiguous strides (#99511)

Commit
1 year ago
improve mkldnn matmul performance when one input is contiguous tensor but the strides is not default contiguous strides (#99511) giving the following case: ``` import torch a= torch.empty_strided([64, 1, 33], [33, 3, 1], dtype=torch.bfloat16).fill_(1) b = torch.randn(64, 33, 256).to(dtype = torch.bfloat16) y = torch.ops.aten.bmm(a, b) ``` ```a``` is a contiguous tensor, but the strides are not defaulted contiguous strides ([33, 33, 1]), onednn matmul always running a non-optimized path: ``` onednn_verbose,exec,cpu,matmul,gemm:jit,undef,src_bf16::blocked:abc:f0 wei_bf16::blocked:abc:f0 dst_bf16::blocked:abc:f0,attr-scratchpad:user ,,64x1x33:64x33x256:64x1x256,7.28711 ``` This PR will convert the inputs' stride to deafult contiguous stride before calling onednn to running an optimization path: ``` onednn_verbose,exec,cpu,matmul,brg:avx512_core_amx_bf16,undef,src_bf16::blocked:abc:f0 wei_bf16::blocked:abc:f0 dst_bf16::blocked:abc:f0,attr-scratchpad:user ,,64x1x33:64x33x256:64x1x256,3.06396 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99511 Approved by: https://github.com/mingfeima, https://github.com/jgong5
Author
Committer
Parents
Loading