Optimize performance for unboxed-only kernels (#25055)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25055
An ATen kernel registered with the c10 dispatcher doesn't need a cache,
so let's not call a cache creator function when the kernel is looked up.
ghstack-source-id: 88834902
Test Plan: unit tests
Differential Revision: D16974248
fbshipit-source-id: 5f9e65d706ec5f836804cb6e5f693f5a01f66714