[PyTorch Edge] Optimize Dequantize Tensor with Intrinsics (#65844)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65844
When run on [Partially Quantized Mobile Vision Transformer Model](https://www.internalfb.com/diff/D30648171), with config from rebasing onto v4 of D31869106
Before:
[AIBench Run (128ms)](https://www.internalfb.com/intern/aibench/details/309792316534505)
[Perf Report](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/model_perf_1635881079420.html)
After:
[AIBench Run (117ms)](https://www.internalfb.com/intern/aibench/details/20433505461364)
[Perf Report](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/model_perf_1635881527831.html)
Total events spent on at::native::dequantize_quantized reduced from 1.97 Billion to 0.97 Billion (~50% Reduction)
ghstack-source-id: 142166373
Test Plan:
To run quantized_test
- Clone open source repo
- Set ANDROID_NDK and ANDROID_SDK
- Build with ```BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_LITE_INTERPRETER=0 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON```
- Move ```build_android/bin/quantized_test``` to devserver
- Use one world to connect to android device (ex. ```one_world android device pixel-3a```)
- In another terminal: Make quantized_test executable (```chmod +x quantized_test```), copy it to android device (```adb push quantized_test /data/local/tmp```), and run it (```adb shell /data/local/tmp/quantized_test```)
Results:
{F676102702}
Also ```buck test mode/dev //caffe2/aten:quantized_test``` passes
To test performance on [Partially Quantized Mobile Vision Transformer Model](https://www.internalfb.com/diff/D30648171) with AI Bench:
- Save this config file: P466124028 (for example: D31869106)
- Before or after the changes in this diff, run ```buck run aibench:run_bench -- -b benchmark_mobile_vision_transformer_model_config.json --platform android/arm64 --framework pytorch --remote --devices Pixel-3a-11-30 --force_profile```
Reviewed By: kimishpatel
Differential Revision: D31066997
fbshipit-source-id: 9067e683e0181aa13a2b636b68ac4fe5a4b2e618