Vectorize int8_t on CPU (#44759)

Commit

4 years ago

Vectorize int8_t on CPU (#44759) Summary: int8_t is not vectorized in vec256_int.h. This PR adds vectorization for int8_t. As pointed out in https://github.com/pytorch/pytorch/issues/43033, this is an important type for vectorization because a lot of images are loaded in this data type. Related issue: https://github.com/pytorch/pytorch/issues/43033 Benchmark (Debian Buster, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz, Turbo off, Release build): ```python import timeit dtype = 'torch.int8' for op in ('+', '-'): for n, t in [(10_000, 200000), (100_000, 20000)]: print(f'a {op} b, numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'c = a {op} b', setup=f'import torch; a = torch.arange(1, {n}, dtype={dtype}); b = torch.arange({n}, 1, -1, dtype={dtype})', number=t)) ``` Results: Before: ``` a + b, numel() == 10000 for 200000 times, dtype=torch.int8 1.2223373489978258 a + b, numel() == 100000 for 20000 times, dtype=torch.int8 0.6108450189931318 a - b, numel() == 10000 for 200000 times, dtype=torch.int8 1.256775538000511 a - b, numel() == 100000 for 20000 times, dtype=torch.int8 0.6101213909860235 ``` After: ``` a + b, numel() == 10000 for 200000 times, dtype=torch.int8 0.5713336059998255 a + b, numel() == 100000 for 20000 times, dtype=torch.int8 0.39169703199877404 a - b, numel() == 10000 for 200000 times, dtype=torch.int8 0.5838428330025636 a - b, numel() == 100000 for 20000 times, dtype=torch.int8 0.37486923701362684 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44759 Reviewed By: malfet Differential Revision: D23786383 Pulled By: glaringlee fbshipit-source-id: 67f5bcd344c0b5014bacbc876143231fca156713

Author

xuhdev

Committer

facebook-github-bot

Parents

f77ba0e4

pytorch 4b3046ed - Vectorize int8_t on CPU (#44759)

pytorch
4b3046ed - Vectorize int8_t on CPU (#44759)