[pytorch] force c10 schema registration for custom build
Summary:
PR #32521 has several issues with mobile builds:
1. It didn't work with static dispatch (which OSS mobile build currently uses);
2. PR #34275 fixed 1) but it doesn't fix custom build for #32521;
3. manuallyBoxedKernel has a bug with ops which only have catchAllKernel: https://github.com/ljk53/pytorch/commit/2d7ede5f71693dca308b00570dcaee6c4148864e
Both 1) and 2) have similar root cause - some JIT side code expects certain schemas to be registered in JIT registry.
For example: considering this code snippet: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/frontend/builtin_functions.cpp#L10
```
auto scalar_operators_source = CodeTemplate(
R"SCRIPT(
def mul(a : ${Scalar}, b : Tensor) -> Tensor:
return b * a
...
```
It expects "aten::mul.Scalar(Tensor self, Scalar other) -> Tensor" to be registered in JIT - it doesn't necessarily need to call the implementation, though; otherwise it will fail some type check: https://github.com/pytorch/pytorch/pull/34013#issuecomment-592982889
Before #32521, all JIT registrations happen in register_aten_ops_*.cpp generated by gen_jit_dispatch.py.
After #32521, for ops with full c10 templated boxing/unboxing support, JIT registrations happen in TypeDefault.cpp/CPUType.cpp/... generated by aten/gen.py, with c10 register API via RegistrationListener in register_c10_ops.cpp. However, c10 registration in TypeDefault.cpp/CPUType.cpp/... are gated by `#ifndef USE_STATIC_DISPATCH`, thus these schemas won't be registered in JIT registry when USE_STATIC_DISPATCH is enabled.
PR #34275 fixes the problem by moving c10 registration out of `#ifndef USE_STATIC_DISPATCH` in TypeDefault.cpp/CPUType.cpp/..., so that all schemas can still be registered in JIT. But it doesn't fix custom build, where we only keep c10 registrations for ops used by specific model directly (for static dispatch custom build) and indirectly (for dynamic dispatch custom build). Currently there is no way for custom build script to know things like "aten::mul.Scalar(Tensor self, Scalar other) -> Tensor" needs to be kept, and in fact the implementation is not needed, only schema needs to be registered in JIT.
Before #32521, the problem was solved by keeping a DUMMY placeholder for unused ops in register_aten_ops_*.cpp: https://github.com/pytorch/pytorch/blob/master/tools/jit/gen_jit_dispatch.py#L326
After #32521, we could do similar thing by forcing aten/gen.py to register ALL schema strings for selective build - which is what is PR is doing.
Measured impact on custom build size (for MobileNetV2):
```
SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```
Before: 3,404,978
After: 3,432,569
~28K compressed size increase due to including more schema strings.
The table below summarizes the relationship between codegen flags and 5 build configurations that are related to mobile:
```
+--------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------+
| | Open Source | FB BUCK |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| | Default Build | Custom Build w/ Stat-Disp | Custom Build w/ Dyna-Disp | Full-JIT | Lite-JIT |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| Dispatch Type | Static | Static | Dynamic | Dynamic (WIP) | Dynamic (WIP) |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| ATen/gen.py | | | | | |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| --op_registration_whitelist | unset | used root ops | closure(used root ops) | unset | closure(possibly used ops) |
| --backend_whitelist | CPU Q-CPU | CPU Q-CPU | CPU Q-CPU | CPU Q-CPU | CPU Q-CPU |
| --per_op_registration | false | false | false | false | true |
| --force_schema_registration | false | true | true | false | false |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| tools/setup_helpers/generate_code.py | | | | | |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| --disable-autograd | true | true | true | false | WIP |
| --selected-op-list-path | file(used root ops) | file(used root ops) | file(used root ops) | unset | WIP |
| --disable_gen_tracing | false | false | false | false | WIP |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
```
Differential Revision: D20397421
Test Plan: Imported from OSS
Pulled By: ljk53
fbshipit-source-id: 906750949ecacf68ac1e810fd22ee99f2e968d0b