[Mobile] Add support for dtypes and custom classes in model tracer (#84795)
Summary: Currently, the model tracer generates the selected features YAML file only with used operators. This change adds support for dtypes and custom classes as well.
We need to add the flag `-DENABLE_RECORD_KERNEL_FUNCTION_DTYPE` when building PyTorch in Instrumentation Mode (i.e. with `TRACING_BASED=1` for server builds) to enable capturing this data.
Test Plan: Built using `USE_NUMPY=0 USE_DISTRIBUTED=0 USE_CUDA=0 TRACING_BASED=1 python setup.py develop`
Ran the model tracer to observe this generated file: https://gist.github.com/dhruvbird/50e1860b39ae065e57d58f17e0912136
Then used the generated YAML to built pytorch (minimal build) using the command
```
BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN=1 \
USE_LIGHTWEIGHT_DISPATCH=0 BUILD_LITE_INTERPRETER=1 \
SELECTED_OP_LIST=/tmp/selected_ops.yaml \
TRACING_BASED=1 \
./scripts/build_mobile.sh
```
After that I generated a binary using this command:
```
g++ /tmp/main.cpp -L build_mobile/lib/ -I build_mobile/install/include/ -ffunction-sections -fdata-sections -Wl,--gc-sections \
-lpthread -lc10 -Wl,--whole-archive -ltorch_cpu -Wl,--no-whole-archive -ltorch -lXNNPACK \
-lpytorch_qnnpack -lcpuinfo -lclog -lpthreadpool -lkineto -lfmt -ldl -lc10
```
The table below shows the size reduction in all build modes.
| Build Type | Unstripped | Stripped |
| ----------- | ----------- | ----------- |
| Standard | 49MiB | 34MiB |
| Minimal w/o dtype | 6.1MiB (12%) | 4.5MiB (18%) |
| Minimal w/ dtype | 3.7MiB (7%) | 2.7MiB (11%) |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84795
Approved by: https://github.com/cccclai