change back to multiple_outputs_gpu_kernel for learnable fake per-channel quantization (#52017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52017
Change back to multiple_outputs_gpu_kernel for per-channel quantization backward c++/cuda implementations (for diff D24479735 (https://github.com/pytorch/pytorch/commit/0c60922fb0614132433779ad45ab8f30783db2ae))
ghstack-source-id: 121409281
Test Plan:
## Unit Test:
`buck test mode/dev-nosan -c fbcode.platform=platform009 //caffe2/test:quantization -- -v TestFakeQuantize`
## Benchmark Test: (checkout f3980d1d678e)
`buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:quantization_test -- --operators FakeQuantizePerTensorOpBenchmark`
`buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:quantization_test -- --operators FakeQuantizePerChannelOpBenchmark`
### In **microseconds** (`1e-6` second),
input size: [1, 3, 256, 256]
| | C++ Kernel | Non-backprop C++ Kernel |
|---------------------------|---------------|------------|-------------------------|---|
| Per Tensor CPU Forward | 1372.123 | 1365.981 |
| Per Tensor Cuda Forward | 84.586 | 27.205|
| Per Channel CPU Forward | 2306.668 | 2299.991|
| Per Channel Cuda Forward | 154.742 | 135.219 |
| Per Tensor CPU Backward | 2544.617 | 581.268|
| Per Tensor Cuda Backward | 304.529 | 137.335|
| Per Channel CPU Backward | 2582.783 |582.088 |
| Per Channel Cuda Backward | 474.265 | 134.082|
input size: [1, 3, 512, 512]
| | C++ Kernel | Non-backprop C++ Kernel |
|---------------------------|---------------|------------|-------------------------|---|
| Per Tensor CPU Forward | 5426.244 | 5726.440 |
| Per Tensor Cuda Forward | 85.834 | 26.871|
| Per Channel CPU Forward | 9125.913 | 9118.152|
| Per Channel Cuda Forward | 159.599 | 145.117 |
| Per Tensor CPU Backward | 14020.830 | 2214.864|
| Per Tensor Cuda Backward | 285.525 | 131.302|
| Per Channel CPU Backward | 14801.976 |2104.345 |
| Per Channel Cuda Backward | 513.025 | 120.222|
Reviewed By: raghuramank100
Differential Revision: D26357325
fbshipit-source-id: f42e3803258b0f6b418eab1301b5e5a466671859