pytorch
4ab0ef36 - change back to multiple_outputs_gpu_kernel for learnable fake per-channel quantization (#52017)

Commit
3 years ago
change back to multiple_outputs_gpu_kernel for learnable fake per-channel quantization (#52017) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52017 Change back to multiple_outputs_gpu_kernel for per-channel quantization backward c++/cuda implementations (for diff D24479735 (https://github.com/pytorch/pytorch/commit/0c60922fb0614132433779ad45ab8f30783db2ae)) ghstack-source-id: 121409281 Test Plan: ## Unit Test: `buck test mode/dev-nosan -c fbcode.platform=platform009 //caffe2/test:quantization -- -v TestFakeQuantize` ## Benchmark Test: (checkout f3980d1d678e) `buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:quantization_test -- --operators FakeQuantizePerTensorOpBenchmark` `buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:quantization_test -- --operators FakeQuantizePerChannelOpBenchmark` ### In **microseconds** (`1e-6` second), input size: [1, 3, 256, 256] | | C++ Kernel | Non-backprop C++ Kernel | |---------------------------|---------------|------------|-------------------------|---| | Per Tensor CPU Forward | 1372.123 | 1365.981 | | Per Tensor Cuda Forward | 84.586 | 27.205| | Per Channel CPU Forward | 2306.668 | 2299.991| | Per Channel Cuda Forward | 154.742 | 135.219 | | Per Tensor CPU Backward | 2544.617 | 581.268| | Per Tensor Cuda Backward | 304.529 | 137.335| | Per Channel CPU Backward | 2582.783 |582.088 | | Per Channel Cuda Backward | 474.265 | 134.082| input size: [1, 3, 512, 512] | | C++ Kernel | Non-backprop C++ Kernel | |---------------------------|---------------|------------|-------------------------|---| | Per Tensor CPU Forward | 5426.244 | 5726.440 | | Per Tensor Cuda Forward | 85.834 | 26.871| | Per Channel CPU Forward | 9125.913 | 9118.152| | Per Channel Cuda Forward | 159.599 | 145.117 | | Per Tensor CPU Backward | 14020.830 | 2214.864| | Per Tensor Cuda Backward | 285.525 | 131.302| | Per Channel CPU Backward | 14801.976 |2104.345 | | Per Channel Cuda Backward | 513.025 | 120.222| Reviewed By: raghuramank100 Differential Revision: D26357325 fbshipit-source-id: f42e3803258b0f6b418eab1301b5e5a466671859
Author
Parents
Loading