[quant][core][gpu][feature] Implemented quantized cuda adaptive average pool2d op (#76081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76081
The current implementation of quantized cuda adaptive average pooling uses the following:
dequant -> fp32 adaptive average pooling -> quant. This is the same numerically as quantized adaptive average pooling. This is not the ideal implementation, as we desire to operate on the quantized values directly. However, we are currently blocked on this as we are waiting for cudnn's 8.5.0 release, which is anticipated to support adaptive average pooling. When that support is made available, we will use it directly.
Test Plan:
```
python test/test_quantization.py TestQuantizedOps.test_adaptive_avg_pool
```
```
python test/test_quantization.py TestQuantizedOps.test_adaptive_avg_pool
```
Differential Revision:
D35768751
D35768751
Reviewed By: jerryzh168
Pulled By: dzdang
fbshipit-source-id: ad06fd06d6941b92105bcabf0fd54b9e27a029d5
(cherry picked from commit 4e1805dd62a9d5e94c61340ac46bcd7aa4e49dd9)