Add dynamic_cast asserts to CPU Loops. (#39258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39258
On CUDA, we currently support casting loops dynamically (i.e. when the argument or return types of the lamba don't match the dtypes of the TensorIterator).
On CPU, before this change we would essentially reinterpret_cast, now we internal assert. We could add dynamic_casting support in the future on CPU.
Test Plan: Imported from OSS
Differential Revision: D21790020
Pulled By: gchanan
fbshipit-source-id: b52f4340a0553f0c1bd8fafaa58309bc110adecf