Boxed variable dispatch (#29934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29934
Previously, when doing boxed dispatch (e.g. custom ops), the dispatcher manually removed the VariableTensorId flag before dispatching
because custom ops don't have variable kernels.
This is one of the blockers that prevented us from using the boxed dispatch mechanism for ops from native_functions.yaml because they define variable kernels and need them to be called for autograd.
This PR changes that. The dispatcher doesn't remove the VariableTensorId flag anymore.
Instead, to make custom ops work, we implement a variable fallback kernel that is called whenever no other variable kernel was found.
ghstack-source-id: 94618474
Test Plan: unit tests
Differential Revision: D18542342
fbshipit-source-id: a30ae35d98f89f7ae507151f55c42cfbed54a451