pytorch
a9583c1f - Vectorize softplus and its backward function on CPU (#32944)

Commit
5 years ago
Vectorize softplus and its backward function on CPU (#32944) Summary: The benchmarking shows a huge performance gain (2-7x faster). Also note that I removed Half support because it isn't generally supported on CPU. Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz) ```python import timeit for op in ('Softplus',): print('Forward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 10000), (100_000, 1000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a)', setup=f'import torch; m = torch.nn.{op}(); a = torch.randn({n}, dtype={dtype})', number=t)) print('Backward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 40000), (100_000, 4000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('y.backward(retain_graph=True)', setup=f'import torch; m = torch.nn.{op}(); a = torch.randn({n}, dtype={dtype}, requires_grad=True); x = m(a); y = x.sum()', number=t)) ``` Before: ``` Forward torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.double 3.73130346799735 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.double 3.6790116359916283 torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.float 2.7477027159911813 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.float 2.7382752639969112 Backward torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.double 7.037510035006562 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.double 5.855093962003593 torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.float 3.413616877005552 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.float 2.5485514330066508 ``` After: ``` Forward torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.double 0.9465823079954134 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.double 0.8799468770012027 torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.float 0.39715987400268205 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.float 0.3563060039887205 Backward torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.double 2.400547721001203 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.double 1.4740848699875642 torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.float 1.6684603010071442 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.float 0.6815649690106511 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32944 Differential Revision: D19725407 Pulled By: VitalyFedyunin fbshipit-source-id: 7430de838df731bd17617eff63f10107d5ad6b8b
Author
Parents
Loading