pytorch
052046b1 - Enabling intra-op parallelism for dynamic quantized Linear operator (#28477)

Commit

5 years ago

Enabling intra-op parallelism for dynamic quantized Linear operator (#28477) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28477 Similar to https://github.com/pytorch/pytorch/pull/26692, we would like to enable the intra-op parallelism for dynamic Linear op. ghstack-source-id: 92419573 Test Plan: CI Test Benchmark: ``` import time import torch K, N = 1024, 1024 print('M', 'nthread=1', 'nthread=2', 'nthread=4', 'nthread=8', 'nthread=16', sep=', ') for M in range(512, 2049, 512): print(M, sep=',', end=', ') for num_threads in (1, 2, 4, 8, 16,): torch.set_num_threads(num_threads) x = torch.rand(M, K) w = torch.rand(K, N) NITER = 20 # Test dynamic quantized q_w = torch.quantize_per_tensor(w, 0.01, 0, dtype=torch.qint8) packed_w = torch.ops.quantized.linear_prepack(q_w, None) s = time.time() for i in range(NITER): torch.ops.quantized.linear_dynamic(x, packed_w) elapsed_per_iter_dyn_quant = (time.time() - s) / NITER print("{:0.2f}".format(2.0*M*N*K/elapsed_per_iter_dyn_quant/1E9), end=', ') print("\n", end='') ``` Before this Diff: ``` (base) [root@rtptest10054.frc2 ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16 512, 119.28, 139.50, 141.66, 141.58, 141.42, 1024, 122.42, 141.21, 123.09, 141.85, 123.03, 1536, 122.80, 122.18, 141.39, 123.25, 141.35, 2048, 123.41, 141.34, 123.62, 140.55, 123.76, ``` After this Diff: ``` (base) [root@rtptest10054.frc2 ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16 512, 123.29, 271.99, 508.66, 882.83, 1295.07, 1024, 126.05, 273.15, 515.42, 914.11, 877.63, 1536, 142.48, 236.85, 524.10, 481.32, 970.81, 2048, 124.76, 279.03, 433.73, 958.67, 1045.82, ``` Differential Revision: D18074757 fbshipit-source-id: ad5b43477d2187c818c137093c6d6af02d5ca1d5

Author

jianyuh

Committer

facebook-github-bot

Parents

9f44a046

Files1

aten/src/ATen/native/quantized/cpu
- qlinear_dynamic.cpp

pytorch 052046b1 - Enabling intra-op parallelism for dynamic quantized Linear operator (#28477)

pytorch
052046b1 - Enabling intra-op parallelism for dynamic quantized Linear operator (#28477)