Rationalize inlining of kernels into the unboxing wrapper (#42845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42845
- In server builds, always allow the compiler to inline the kernel into the unboxing wrapper, i.e. optimize for perf.
- In mobile builds, never inline the kernel into the unboxing wrapper, i.e. optimize for binary size.
Note that this only applies for registration API calls where we can actually inline it, i.e. calls with `TORCH_FN` or some of the old API calls.
Registrations that give the registration API a runtime function pointer can't inline and won't do so on server either.
Note also that in server builds, all we do is **allow** the compiler to inline. We don't force inlining.
ghstack-source-id: 114177591
Test Plan:
waitforsandcastle
https://www.internalfb.com/intern/fblearner/details/225217260/
Reviewed By: ezyang
Differential Revision: D23045772
fbshipit-source-id: f74fd600eaa3f5cfdf0da47ea080801a03db7917