[pt][quant] Parallelize quantize and dequantize (#33765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33765
quantize and dequantize methods now make use of multiple threads. This makes use of shz0116's recent parallelization of quantize/dequantize routines in FBGEMM.
Fixes:
https://github.com/pytorch/pytorch/issues/32006
https://github.com/pytorch/FBGEMM/issues/142
Alternative to https://github.com/pytorch/pytorch/pull/30153
```
#!/usr/bin/env python
import time
import torch
import torch.nn as nn
torch.set_num_threads(4)
# print(torch.__config__.parallel_info())
W = torch.rand(1, 54, 54, 256)
NITER = 1000
s = time.time()
for i in range(NITER):
W_q = torch.quantize_per_tensor(W, scale=1.0, zero_point = 0, dtype=torch.quint8)
time_per_iter = (time.time() - s) / NITER
print('quantize time per iter ms', time_per_iter * 1000)
s = time.time()
for i in range(NITER):
W_deq = W_q.dequantize()
time_per_iter = (time.time() - s) / NITER
print('dequantize time per iter ms', time_per_iter * 1000)
```
### With 1 thread
quantize time per iter ms 0.22633790969848633
dequantize time per iter ms 0.6573665142059326
### With 4 threads
quantize time per iter ms 0.0905618667602539
dequantize time per iter ms 0.19511842727661133
ghstack-source-id: 98935895
Test Plan: python test/test_quantized.py
Reviewed By: jspark1105
Differential Revision: D20098521
fbshipit-source-id: bd8c45761b4651fcd5b20b95759e3868a136c048