update speed benchmark binary to work in USE_STATIC_DISPATCH mode (#25449)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25449
Currently Variable and Tensor are still not 100% merged. There are
various places in ATen/TH codebase where it asserts input type to be
Variable/Tensor.
Usually when input type is Variable it will dispatch function calls to
corresponding generated VariableType methods, where it converts input
Variable type to Tensor type with "unpack()" before calling into LegacyTHFunctions
and then converts result from Tensor type back to Variable type with "as_variable()".
However, when USE_STATIC_DISPATCH mode is enabled, it no longer dispatches function
calls to VariableType methods. This way, Variable inputs will remain as
Variable instances when they reach LegacyTHFunctions and fail the "checked_tensor_unwrap"
asserts. And there are a couple other failed asserts because of similar reason.
There are several options to address this problem with USE_STATIC_DISPATCH:
1. Wait until Variable and Tensor are fully merged as planned in https://github.com/pytorch/pytorch/issues/23032;
2. Create Tensors instead of Variables upfront on caller side (JIT);
3. Fix downstream asserts in ATen/TH to tolerant Variable inputs when AutoGrad is disabled;
Option 1 will still take some time; Option 2 was tried before and caused
a lot problems; Option 3 needs to be conducted case by case as it can be
dangerous to remove asserts before 100% merge happens.
After digging into it a bit more, turns out NonVariableTypeMode not only controls how
it dispatches, but also controls TensorImpl.is_variable() result. So the
problem can be addressed by:
1. Set AutoNonVariableTypeMode mode right before calling forward();
2. Make sure all inputs/params are created as Variable, e.g.:
A. should use torch::ones() to create test input tensor instead of at::ones();
B. should not set AutoNonVariableTypeMode before torch::jit::load() call;
This diff applied these changes to speed benchmark to proof how it works.
Test Plan:
- Build speed benchmark binary for Android:
```
./scripts/build_android.sh \
-DBUILD_BINARY=ON \
-DBUILD_CAFFE2_MOBILE=OFF \
-DUSE_STATIC_DISPATCH=ON \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
```
- Push binaries and model to Android device:
```
adb push build_android/bin/speed_benchmark_torch /data/local/tmp
adb push resnet.pb /data/local/tmp
```
- Run inference on device:
```
/data/local/tmp # ./speed_benchmark_torch --model=resnet.pb \
--input_dims="1,3,224,224" --input_type=float --print_output=true
```
Differential Revision: D17128567
Pulled By: ljk53
fbshipit-source-id: 58cc49ff35d21fefc906172cc3271f984eeb29f0