Move where cuda implementation to TensorIterator (#33228)
Summary:
Reopen of https://github.com/pytorch/pytorch/pull/32984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33228
Differential Revision: D19850862
Pulled By: ngimel
fbshipit-source-id: b92446a49b4980188fa4788220a2164650e905c2