Loops: Separate out dynamic_casting concerns from complex overloads. (#39254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39254
dynamic_casting is meant to handle CUDA kernels when the operand dtypes don't match the C++ kernel function types.
This is made more complicated by the current state of complex, which uses thrust::complex, std::complex, c10::complex.
Currently, thrust::complex and std::complex map to need dynamic casting even though we don't actually cast them.
But, making them not need dynamic_cast doesn't work either because certain dynamic_casting optimizations don't work with thrust::complex and (maybe) std::complex.
So, we separate out these concerns so we can iterate on dynamic_casting checks, in particular by applying them to CPU.
This PR should have no functional change.
Test Plan: Imported from OSS
Differential Revision: D21788870
Pulled By: gchanan
fbshipit-source-id: 5d69c9851423dee2fbe789674f4306710378f4ff