Reduce time spent per guard by comparing TensorType with Tensor (#39098)
Summary:
It mainly reduces the time spent on allocating new TensorType object for Tensor, but comparing them directly.
benchmark result before and after this PR: https://gist.github.com/ailzhang/db44d0a1911cae62e0bb794bff33f40a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39098
Differential Revision: D21786678
Pulled By: ailzhang
fbshipit-source-id: 2f61f0ac1dc8c529c45bef4e149be431ff1608b0