[PyTorch Mobile] Record dtypes for tensors used in kernel function implementations (#48826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48826
This change updates various macros to pass in the kernel tag string (`const char*`) into the macro that sets up the `case` statement for the dtype switch. This macro already receives the dtype (enum) which we also need.
There are 2 phases we need to build out for the `dtype` tracing to work:
1. Recording Phase
2. Conditional Compilation Phase
For this most part, this change is trying to focus on [1] (The Recording Phase) and sets up a new `RecordScope` enum value to track kernel dtypes. This code is compiled in only if a specific macro is defined (since this is an **extremely** hot code path, and even the slightest regression here can cause tremendous slow down overall).
I have only added a skeleton of the phase [2] (Conditional Compilation Phase) and there is a no-op `constexpr` method that selects every dtype in the kernel implementation. In subsequent diffs, this will be updated to point to a code-generated function based on the result of tracing the models that were requested.
ghstack-source-id: 118336675
Test Plan: See the next few diff in the stack for the application of this change to both record triggered dtypes (in kernel functions) as well as select dtype specific portions of kernel functions.
Reviewed By: ezyang
Differential Revision: D24220926
fbshipit-source-id: d7dbf21c7dcc6ce981d0fd4dcb62ca829fe3f69d