pytorch
25f0dc34 - BERT CPU performance optimization: use mkldnn for nn.Linear() when input is dense layout (#21851)

Commit View On GitHub

Commit

5 years ago

BERT CPU performance optimization: use mkldnn for nn.Linear() when input is dense layout (#21851) Summary: This PR aims at improving BERT performance on CPU by using `mkldnn` inner product for `nn.Linear()`. The current logic is to use `mkldnn` only when `input` tensor is of mkldnn layout. This PR loosens this condition, `mkldnn` will be used for `nn.Linear()` when `input` tensor is of dense layout. The aten tensor is viewed inplace in `mkldnn` without additional memory copy. 1. when `input.dim() >= 3` , it is viewed as 2d tensor. e.g. `[T, N, C]` is treated as `[TN, C]`; 2. when `input` is not contiguous, it is copied so as to be contiguous. `mkldnn` inner product can't handle non-contiguous memory. With this PR, BERT on `glue/MRPC` inference (batch size = 1) on Xeon 6148 single socket (20 cores@2.5GHz) improves by `44%`: 1. before (unit: iterations/sec): ```bash 408/408 [00:24<00:00, 16.69it/s] ``` 2. after (unit: iterations/sec): ```bash 408/408 [00:16<00:00, 24.06it/s] ``` The latency reduces from `59.92 ms` to `41.56ms` correspondingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21851 Differential Revision: D16056334 Pulled By: dzhulgakov fbshipit-source-id: 9b70ed58323b5e2f3f4e3ebacc766a74a8b68a8a

Author

mingfeima

Committer

facebook-github-bot

Parents

12ac9171

pytorch 25f0dc34 - BERT CPU performance optimization: use mkldnn for nn.Linear() when input is dense layout (#21851)

Commit

pytorch
25f0dc34 - BERT CPU performance optimization: use mkldnn for nn.Linear() when input is dense layout (#21851)