Allow inlining of more Tensor methods (#53905)
Summary:
This `is_meta` call in `TensorIterator` shows up in profiling as around 4-5% of fast setup time:
https://github.com/pytorch/pytorch/blob/49a5f99440bde6a2f214e9c0b64c3ae0fdfb5a59/aten/src/ATen/TensorIterator.cpp#L886
After inlining, `is_meta()` compiles to a single `test` instruction. Saving 20-30 ns per operator call. The functions I'm moving into the header here are all similar, in that they inline away to almost nothing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53905
Reviewed By: gchanan
Differential Revision: D27513232
Pulled By: swolchok
fbshipit-source-id: 33ec9eefecd0ddebc285e1d830edb558818dc391