Make PythonArgs::tensor and PythonArgs::scalar faster (#22782)
Summary:
Speeds up the common case where Tensor is a torch.Tensor (not a
subclass). This reduces the number of executed instructions for a
torch.add(tensor1, tensor2) by ~328 (should be ~65 ns faster).
Note that most of the PythonArgs accessors are too large to be inlined.
We should move most of them to the cpp file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22782
Differential Revision: D16223592
Pulled By: colesbury
fbshipit-source-id: cc20f8989944389d5a5e3fab033cdd70d581ffb1