pytorch
b0e954ff - quantize_tensor_per_channel ARM implementation (#46018)

Commit

4 years ago

quantize_tensor_per_channel ARM implementation (#46018) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46018 Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. Test Plan: Build for ARM Neon ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI="armeabi-v7a with NEON" ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Build for ARM64 ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Then run the benchmark binary over adb shell. Note that by android cpu is not frequency locked by default and can lead to noisy benchmark results, but this can be changed by running the following for every cpu. ``` adb shell "echo userspace > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_governor" adb shell "echo '2000000' > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_setspeed" adb push build_android/bin/quantize_per_channel /data/local/tmp/ adb shell "/data/local/tmp/quantize_per_channel" ``` Resulting benchmarks are located [here](https://gist.github.com/AJLiu/d1711bb6a5e93b3338eca2c14c8aec9f) Google spreadsheet comparing results [here](https://docs.google.com/spreadsheets/d/1Ky-rEu2CqOqex2a84b67hB1VLAlfEDgAN2ZXe8IlGF8/edit?usp=sharing) Reviewed By: kimishpatel Differential Revision: D24286528 fbshipit-source-id: 5481dcbbff8345a2c0d6cc9b7d7f8075fbff03b3

Author

ajliu

Committer

facebook-github-bot

Parents

ecfa7a27

pytorch b0e954ff - quantize_tensor_per_channel ARM implementation (#46018)

pytorch
b0e954ff - quantize_tensor_per_channel ARM implementation (#46018)