SemanticDiff

pytorch
0964b662 - qnnpack hardswish - LUTs (#36252)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

4 years ago

qnnpack hardswish - LUTs (#36252) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36252 Adds a baseline hardswish kernel using LUTs in QNNPACK. Performance is 1.9 GB/s on a Nexus 6 and 2.2 GB/s on Pixel 3 - same as other LUT based ops. Enforcing scale and zp to be equal to the input, to match the server implementation. There are some potential improvements in rewriting this as NEON kernels for a further speedup - saving that until later, if we need it. Test Plan: ``` with-proxy ./scripts/build-local.sh ./build/local/hardswish-test with-proxy scripts/build-android-armv7.sh adb push ./build/android/armeabi-v7a/hardswish-* /data/qnnpack adb shell /data/qnnpack/hardswish-test /data/qnnpack/hardswish-bench with-proxy scripts/build-android-arm64.sh adb push ./build/android/arm64-v8a/hardswish-* /data/qnnpack /data/qnnpack/hardswish-test /data/qnnpack/hardswish-bench ``` Imported from OSS Differential Revision: D20965044 fbshipit-source-id: 982938361971513cb15873438e12c23a38e819e3

Author

vkuzo

vkuzo

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading