Allow overwriting catch-all kernels (#25947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25947
Previously, the c10 dispatcher didn't allow having a catch-all kernel and backend specific kernels at the same time.
This is also the long term goal. But to make the current XLA implementation work, we need to allow them to overwrite these ops with XLA variants.
This diff changes that so that ops can have both, catchall and backend specific kernels, and will call into the catchall kernel if there is no more specific kernel registered.
This is also the current behavior of globalATenDispatch.
ghstack-source-id: 90049398
Test Plan: unit tests
Differential Revision: D17293036
fbshipit-source-id: f2d5928e904c1dc9b6b89e9bb468debe48a4056c