pytorch
eac54f18 - Vectorize SmoothL1Loss forward (CPU) (#37115)

Commit View On GitHub

Commit

4 years ago

Vectorize SmoothL1Loss forward (CPU) (#37115) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37115 Benchmark (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz): ```python import timeit for op in ('SmoothL1Loss',): print('Forward') for dtype in ('torch.double', 'torch.float', 'torch.bfloat16'): for n, t in [(10_000, 100000), (100_000, 10000)]: print(f'torch.nn.{op}()(a, b), |a-b|>1, numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a, b)', setup=f'import torch; m = torch.nn.{op}(); a = torch.full(({n},), 1, dtype={dtype}); b = torch.full(({n},), 3, dtype={dtype})', number=t)) print(f'torch.nn.{op}()(a, b), |a-b|<1, numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a, b)', setup=f'import torch; m = torch.nn.{op}(); a = torch.full(({n},), 1, dtype={dtype}); b = torch.full(({n},), 1.5, dtype={dtype})', number=t)) ``` Results: Before: ``` Forward torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 10000 for 100000 times, dtype=torch.double 2.8427017140056705 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 10000 for 100000 times, dtype=torch.double 2.823863306999556 torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 100000 for 10000 times, dtype=torch.double 0.9239509999897564 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 100000 for 10000 times, dtype=torch.double 0.9014650480094133 torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 10000 for 100000 times, dtype=torch.float 2.4530331650021253 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 10000 for 100000 times, dtype=torch.float 2.4551637870026752 torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 100000 for 10000 times, dtype=torch.float 0.5716871829936281 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 100000 for 10000 times, dtype=torch.float 0.5748704470024677 torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 10000 for 100000 times, dtype=torch.bfloat16 9.777982015002635 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 10000 for 100000 times, dtype=torch.bfloat16 12.627838339001755 torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 100000 for 10000 times, dtype=torch.bfloat16 7.810075458997744 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 100000 for 10000 times, dtype=torch.bfloat16 10.73597132100258 ``` After: ``` Forward torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 10000 for 100000 times, dtype=torch.double 2.8420191049808636 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 10000 for 100000 times, dtype=torch.double 2.8814279660000466 torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 100000 for 10000 times, dtype=torch.double 0.9491433810035232 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 100000 for 10000 times, dtype=torch.double 0.9144560259883292 torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 10000 for 100000 times, dtype=torch.float 2.4458729829930235 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 10000 for 100000 times, dtype=torch.float 2.4474395569995977 torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 100000 for 10000 times, dtype=torch.float 0.5676976410031784 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 100000 for 10000 times, dtype=torch.float 0.5793530470109545 torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 10000 for 100000 times, dtype=torch.bfloat16 4.32380092900712 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 10000 for 100000 times, dtype=torch.bfloat16 4.332892568985699 torch.nn.SmoothL1Loss()(a, b), |a-b|>1, numel() == 100000 for 10000 times, dtype=torch.bfloat16 2.3354615129937883 torch.nn.SmoothL1Loss()(a, b), |a-b|<1, numel() == 100000 for 10000 times, dtype=torch.bfloat16 2.3352111729909666 ``` Test Plan: Imported from OSS Differential Revision: D21351860 Pulled By: VitalyFedyunin fbshipit-source-id: b19ca1e58586d964972e5c495aba10c8808cd747

Author

xuhdev

Committer

facebook-github-bot

Parents

b90fc52c

pytorch eac54f18 - Vectorize SmoothL1Loss forward (CPU) (#37115)

Commit

pytorch
eac54f18 - Vectorize SmoothL1Loss forward (CPU) (#37115)