Use normal dispatch to get to CUDA threshold kernels, instead of DispatchStub. (#30307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30307
DispatchStub will stop working when I split CPU/CUDA libraries, because
there are some symbols from the templates in DispatchStub stubs which aren't
properly exported and I couldn't figure out how to make them dispatch properly.
This is the only case where DispatchStub is being used to dispatch to CUDA,
anyway.
This partially addresses #29844 but I need to also just completely delete
the CUDA registration logic from DispatchStub entirely.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18762362
Pulled By: ezyang
fbshipit-source-id: bdfa8739c0daf23badf3c5af61890a934af00813