inlining a function that i noticed were hot during previous benchmarking (#50848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50848
I noticed that the call overhead from `Tensor::device()` for ~1-2% of instruction counts depending on the microbenchmark
Some nice looking instruction count wins https://www.internalfb.com/intern/paste/P164529004/
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D25984136
Pulled By: bdhirsh
fbshipit-source-id: 0e54f2afe78caeb5a03abbb15e9197556acfeca1