Add gtest for TensorIterator (#21253)
Summary:
This adds a regression test for the bug fix in #21236. Operations
involving CUDA tensors an CPU scalars should not copy the CPU scalar to
the device (because that is slow). They should instead "lift" the scalar
to a kernel parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21253
Reviewed By: bddppq
Differential Revision: D15604080
Pulled By: colesbury
fbshipit-source-id: c14ded5d584499eaa5ea83337ffc50278205f3d6