pytorch
a09c4d39 - [pt][quant] Vectorized qmul and more methods on qint data types (#34376)

Commit
4 years ago
[pt][quant] Vectorized qmul and more methods on qint data types (#34376) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34376 Vectorized implementation of qmul. qmul is now ~16x faster on my development machine. This implementation works for qint8, quint8 and qint32. Also added some commonly used operations, such as multiply operator, requantize operation etc., to qint vector classes for future use. ``` #!/usr/bin/env python import time import torch import torch.nn as nn torch.set_num_threads(1) # print(torch.__config__.parallel_info()) A = torch.rand(1, 54, 54, 256) B = torch.rand(1, 54, 54, 256) scale = .05 zero_point = 50 for dtype in [torch.quint8, torch.qint8]: qA = torch.quantize_per_tensor(A, scale=scale, zero_point=zero_point, dtype=dtype) qB = torch.quantize_per_tensor(B, scale=scale, zero_point=zero_point, dtype=dtype) NITER = 1000 s = time.time() for i in range(NITER): out = torch.ops.quantized.mul(qA, qB, scale=scale, zero_point=zero_point) time_per_iter = (time.time() - s) / NITER print('dtype: {} time per iter ms: {:.3f}'.format(dtype, time_per_iter * 1000)) ``` ### Before dtype: torch.quint8 time per iter ms: 6.714 dtype: torch.qint8 time per iter ms: 6.780 ### After dtype: torch.quint8 time per iter ms: 0.431 dtype: torch.qint8 time per iter ms: 0.417 ### Test Modified qmul tests to include qint8 and qint32 data types. python test/test_quantized.py TestQuantizedOps.test_qmul_relu_same_qparams python test/test_quantized.py TestQuantizedOps.test_qmul_relu_different_qparams python test/test_quantized.py TestQuantizedOps.test_qmul_broadcast ghstack-source-id: 99862681 Differential Revision: D20308515 fbshipit-source-id: 4fa65b2ba433cfd59260fc183a70f53a6fcc36b4
Author
Parents
Loading