Speed up calculate Qparams for per-channel observers (#30485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30485
Use vectorization to speed up calculate Qparams for per-channel observers. New implementation is about 1000 times faster.
Task:
https://github.com/pytorch/pytorch/issues/30348#event-2824868602
ghstack-source-id: 102808561
Test Plan:
```
import torch
import time
import numpy as np
from torch.quantization.observer import PerChannelMinMaxObserver
obs = PerChannelMinMaxObserver()
acc_time = 0
X = torch.randn(1000, 10)
obs(X)
for i in range(100):
start = time.time()
obs.calculate_qparams()
acc_time = acc_time + time.time()-start
print(acc_time)
```
Before change:
20.3
After change:
0.017
Differential Revision: D18711905
fbshipit-source-id: 3ed20a6734c9950773350957aaf0fc5d14827640