Move device type init from BackendSelect to backend kernels (#37402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37402
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.
Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
Not all factory ops appear there, nor are all the ops that do
appear there factory ops. Currently we generate init code for
exactly the BackendSelect ops, but the choice should be better
motivated.
* the previous scheme maps HIP to its own legacy type dispatch
entry, but the logic assumes it's exclusive with CUDA, and no
ops appear to mention HIP explicitly, so the new logic doesn't
expose a static entry point for it. Needs to be verified.
Test Plan: Imported from OSS
Differential Revision: D21282974
Pulled By: bhosmer
fbshipit-source-id: cd46eb788596948e0572a15fac0f8b43feca5d75