pytorch
d2a58bfe - Add mkldnn tanh operator (#54656)

Commit
3 years ago
Add mkldnn tanh operator (#54656) Summary: ## :rocket: Feature Add Mkl-Layout kernel for tanh. ## Motivation We want to add a Mkl-Layout kernel for tanh to improve tanh's performance when the input Tensor is Mkl-Layout. Because, PyTorch does not have the Mkl-Layout kernel for tanh, so it cannot execute the tanh input by the Mkl-Layout Tensor. Off course you can temporarily avoid this problem by executing to_dense/to_mkldnn, but the performance is significantly reduced due to the copy overhead(1.6-4.3 times slower than CPU kernel). ## Perfomance results ### Environment - CPU: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz - OS: 18.04.1 LTS - compiler: gcc 7.5.0 - branch: master - commit ID: fe2c126 - build Environment variable: USE_CUDA=0 - Python: 3.6.9 - Intel MKL(Math Kernel Library): 2020.2-254 - Intel oneDNN: 1.8.1 ### Benchmark script ``` python import torch import torch.nn as nn torch.manual_seed(1) x = torch.randn(2048, 2048) x_mkl = x.to_mkldnn() print("### CPU tanh") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): output = x.tanh() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### CPU tanh_") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): x.tanh_() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### to_dense/to_mkldnn + tanh") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): output = x_mkl.to_dense().tanh().to_mkldnn() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### to_dense/to_mkldnn + tanh_") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): x_mkl.to_dense().tanh_().to_mkldnn() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### Mkl-Layout tanh") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): output = x_mkl.tanh() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### Mkl-Layout tanh_") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): x_mkl.tanh_() print(prof.key_averages().table(sort_by="self_cpu_time_total")) ``` ### Results #### OMP_NUM_THREADS=1 Results(Self CPU time total ms) | Operation | CPU kernel | to_dense/to_mkldnn+CPU kernel | Mkl-Layout kernel(This PR) | | ---------- | ---------- | ----------------------------- | -------------------------- | |tanh | 579.662 | 1658.000 | 617.565 | | tanh_ | 554.477 | 881.997 | 589.426 | #### OMP_NUM_THREADS=6 Results(Self CPU time total ms) | Operation | CPU kernel | to_dense/to_mkldnn+CPU kernel | Mkl-Layout kernel(This PR) | | ---------- | ---------- | ----------------------------- | -------------------------- | |tanh | 182.387 | 421.336 | 136.226 | | tanh_ | 94.331 | 404.931 | 99.254 | ## Modification policy for the code oneDNN is already supported tanh operation. [oneDNN: Elementwise](https://spec.oneapi.com/versions/latest/elements/oneDNN/source/primitives/eltwise.html) There is already exist sigmoid implementation that uses the same Elementwise API as tanh, so we created this PR code with reference to the sigmoid implementation. https://github.com/pytorch/pytorch/blob/527c1e0e37b7c65148bcbc390b65e94fb4624a9d/aten/src/ATen/native/mkldnn/UnaryOps.cpp#L28-L42 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54656 Test Plan: A test for sigmoid has already been created as shown below. So, I added a new test of tanh referring to the test of sigmoid. https://github.com/pytorch/pytorch/blob/527c1e0e37b7c65148bcbc390b65e94fb4624a9d/test/test_mkldnn.py#L944-L954 ### mkldnn tanh test result ``` $ python3 test/test_mkldnn.py TestMkldnn.test_tanh Couldn't download test skip set, leaving all tests enabled... . ---------------------------------------------------------------------- Ran 1 test in 0.004s OK ``` Reviewed By: gchanan Differential Revision: D27395827 Pulled By: ezyang fbshipit-source-id: d4481332de187e2dea095f9b6aabc73a497960fe
Author
Parents
Loading