pytorch
729f8425 - Use Caffe2's implementation of grouped depthwise 3x3 convolutions (#26556)

Commit
6 years ago
Use Caffe2's implementation of grouped depthwise 3x3 convolutions (#26556) Summary: Use Caffe2's implementation of grouped depthwise 3x3 convolutions instead of NNPACK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26556 Test Plan: _Correctness_ - Manually check the results using the --print-output flag on speed_benchmark_torch. _Performance_ - All measurements below on Pixel 2 **Before**: Multi-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25" > > Main run finished. Milliseconds per iter: **876.002**. Iters per second: 1.14155 Single-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > --caffe2_threadpool_force_inline=true" > > Main run finished. Milliseconds per iter: **459.409**. Iters per second: 2.17671 **After**: Multi-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > > Main run finished. Milliseconds per iter: **285.68**. Iters per second: 3.50042 Single-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > --caffe2_threadpool_force_inline=true" > Main run finished. Milliseconds per iter: **278.999**. Iters per second: 3.58425 > Differential Revision: D17533311 Pulled By: AshkanAliabadi fbshipit-source-id: 9ee8acf02b8e3e8da1922b188ed0a6459a90b67d
Author
Ashkan Aliabadi
Parents
Loading