pytorch
506996c7 - [pt][quant] Optimized qadd_scalar (#34925)

Commit
4 years ago
[pt][quant] Optimized qadd_scalar (#34925) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34925 Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` ghstack-source-id: 100595212 Test Plan: buck test //caffe2/test:quantized -- 'test_qadd' --print-passing-details Differential Revision: D20500848 fbshipit-source-id: c292d15da121e6d13cc4eb92f10549874ff6ab0f
Author
Parents
Loading