Remove index calculation in quantized max_pool2d (#25526)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25526
This is not used, adds unnecessary operations in the tight inner loop, and makes vectorization extremely difficult
Benchmark script
```
import torch, time
for dtype in [torch.qint8, torch.quint8, torch.qint32]:
print('****', str(dtype), '*****')
x = torch.rand(1, 56, 56, 256)
q_x = torch.quantize_linear(x, 0.5, 1, dtype)
q_x = q_x.permute([0, 3, 1, 2])
x = x.permute([0, 3, 1, 2])
NITER = 100
s = time.time()
for i in range(NITER):
float_out = torch.max_pool2d(x, kernel_size=3, stride=None, padding=0, dilation=1)
time_per_iter_float = (time.time() - s) / NITER
s = time.time()
for i in range(NITER):
quant_out = torch.max_pool2d(q_x, kernel_size=3, stride=None, padding=0, dilation=1)
time_per_iter_quant = (time.time() - s) / NITER
print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')
numel = x.numel() + float_out.numel()
float_bw_gbps = (numel * 4) / time_per_iter_float / 1e9
quant_bw_gbps = numel / time_per_iter_quant / 1e9
print('GB/s float', 'GB/s quant', sep='\t')
print(float_bw_gbps, quant_bw_gbps, sep='\t')
```
Before this change (AVX2)
```
$ OMP_NUM_THREADS=1 python pool_bench.py
**** torch.qint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
3.6582303047180176 2.891871929168701 0.7905111729677203
GB/s float GB/s quant
0.9685120139731342 0.30629295546107427
**** torch.quint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
3.6472487449645996 2.889857292175293 0.7923389640383144
GB/s float GB/s quant
0.9714281223323551 0.3065064847313822
**** torch.qint32 *****
time/iter ms (float) time/iter ms (quant) quant/float
3.7154507637023926 3.0337929725646973 0.8165342957045585
GB/s float GB/s quant
0.9535962727896339 0.291964549990766
```
After this change (AVX2)
```
$ OMP_NUM_THREADS=1 python pool_bench.py
**** torch.qint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
3.869810104370117 1.928541660308838 0.4983556320065849
GB/s float GB/s quant
0.9155591371263668 0.45929005228653125
**** torch.quint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
4.014170169830322 1.846764087677002 0.460061235459548
GB/s float GB/s quant
0.8826332342930452 0.47962812679240213
**** torch.qint32 *****
time/iter ms (float) time/iter ms (quant) quant/float
3.983309268951416 1.848154067993164 0.4639745355448337
GB/s float GB/s quant
0.8894714823217043 0.4792674027235246
```
Test Plan: Imported from OSS
Differential Revision: D17166342
Pulled By: jamesr66a
fbshipit-source-id: ce6b29349ceb4912a0dba4d085ef9a3cc1a2e965