pytorch
54217e69 - [Quant] Add fast path of qmean/qstd for quantized CPU (reopen #70172) (#80579)

Commit

2 years ago

[Quant] Add fast path of qmean/qstd for quantized CPU (reopen #70172) (#80579) > Note: This is a reopen of https://github.com/pytorch/pytorch/pull/70172 which was merged then reverted. Add fast path of qmean and qstd when computation is done in innermost dimensions for quantized CPU. The fast path supports inputs in contiguous memory format. For example: ```python X = torch.randn((2,3,4,5), dtype=torch.float) qX = torch.quantize_per_tensor(X, scale, zero_point, torch_type) # dim can be: -1, (-1, -2), (-1, -2, -3), (-1, -2, -3, -4), 3, (3, 2), (3, 2, 1), (3, 2, 1, 0) or None dim = -1 qY = torch.mean(qX, dim) # qY = torch.std(qX, dim) ``` **Performance test results** Test Env: - Intel® Xeon® CLX-8260 - 1 instance, 4 cores - Using Jemalloc Test method: Create 4d contiguous tensors as inputs, set `dim` to the innermost two dimensions `(-1, -2)`, then do the following tests - Quantize inputs and use the fast path - Quantize inputs and use the reference path - Use fp32 kernel (no quantization) Mean: exec time (us) vs. shape ![image](https://user-images.githubusercontent.com/12522207/148152617-604f2841-cfcd-495c-ae88-c27d9165b46a.png) Std: exec time (us) vs. shape ![image](https://user-images.githubusercontent.com/12522207/148152632-3a8dceb1-0057-42c9-af65-1e26d697ff0c.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80579 Approved by: https://github.com/malfet

Author

Xia-Weiwen

Committer

pytorchmergebot

Parents

4c279994

pytorch 54217e69 - [Quant] Add fast path of qmean/qstd for quantized CPU (reopen #70172) (#80579)

pytorch
54217e69 - [Quant] Add fast path of qmean/qstd for quantized CPU (reopen #70172) (#80579)