Xray image inference on multi-cpu and dumping dnnlowp tensors (#22537)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22537
Enable multi-CPU model evaluation;
Dump intermediate tensors in conv dnnlowp operators for debugging
Test Plan:
Local run and dump tensors:
```
buck run mode/opt experimental/summerdeng/xray_image:test_net_quantization -- --model_path=/mnt/public/summerdeng/xray_image/models/oct_resnext101_50.mdl --batch_size=1 --test_max_images=100 --octave_conv --octave_conv_ratio=0.5 --output_dir=/mnt/public/summerdeng/xray_image/output --num_cpus=4 --caffe2_dnnlowp_dump_tensors
```
Dumped .mtx files can be found here: /mnt/public/summerdeng/xray_image/dump_tensors
Histogram plots can be found here: https://our.intern.facebook.com/intern/anp/view/?id=112033
Example flow runs for model evaluation:
f124056759 Evaluating fp32 Oct-ResNext101 with 16 cpus
```
fry flow-cpu --resource '{"cpu_core": 16}' --binary-type local --name [quantization_eval]oct_resnext101_0.5_fp32_16cpu --disable-source-snapshot true --distribute-to-local-dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --flow-entitlement gpu_prod ~/fbsource/fbcode/buck-out/gen/experimental/summerdeng/xray_image/test_net_quantization.par --test_data="/mnt/vol/gfsai-oregon/ai-group/users/zyan3/octconv/xray_v11_annotation_data_fullfeat_32x4_dedup_split_05202019_posonly_labeled_2018_05_29_test.csv" --batch_size=1 --model_path="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/oct_resnext101_50.mdl" --octave_conv --octave_conv_ratio=0.5 --test_max_images=-1 --output_dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --num_cpus=16
```
f124275053 Evaluating int8 Oct-ResNext101 with 16 cpus
```
fry flow-cpu --resource '{"cpu_core": 16}' --binary-type local --name [quantization_eval]oct_resnext101_0.5_int8_nongroupwise_l2approx --disable-source-snapshot true --distribute-to-local-dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --flow-entitlement gpu_prod ~/fbsource/fbcode/buck-out/gen/experimental/summerdeng/xray_image/test_net_quantization.par --test_data="/mnt/vol/gfsai-oregon/ai-group/users/zyan3/octconv/xray_v11_annotation_data_fullfeat_32x4_dedup_split_05202019_posonly_labeled_2018_05_29_test.csv" --batch_size=1 --model_path="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/oct_resnext101_50.mdl" --octave_conv --octave_conv_ratio=0.5 --test_max_images=-1 --int8_model_saved --int8_model_type="mdl" --output_dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --int8_model_mdl_name="int8_oct_resnext101_50_nongroupwise_l2approx.mdl" --num_cpus=16
```
Reviewed By: stephenyan1231
Differential Revision: D16106577
fbshipit-source-id: 9de359f2afe7f9a7722ae404f0d9aeca1d9c3c75