pytorch
f03a8f05 - [reland] Deprecate registering autograd kernels at not an autograd key (#105078)

Commit View On GitHub

Commit

1 year ago

[reland] Deprecate registering autograd kernels at not an autograd key (#105078) Summary: Context ------- This PR adds a new fallback to the Autograd dispatch keys. If you would prefer the old behavior: - A quick (unsupported) way to get the previous behavior is to call `torch._C._set_autograd_fallback("nothing")` - Register "torch::CppFunction::makeFallthrough()" to your Autograd key, like in https://gist.github.com/zou3519/d09a5f4b1afe2430af09fea67c6ff2c8 It is possible that this PR regresses performance of overhead-bound models. If this is the case, please reach out (and apply one of the temporary fixes in the previous section). Description for reviewers ------------------------- In order to deprecate registering autograd kernels at not an autograd key, we add a fallback to the Autograd dispatch keys. This fallback raises a warning if the user attempts to backprop through the operator and is also configurable to either warn or not warn. The goal of this PR is to - preserve as much BC as possible - raise a warning that whatever the user is doing is potentially wrong. - be as performant as possible There are roughly two cases: - if the post-autograd kernels return a Tensor that requires grad, then we install an autograd hook that raises a warning. We are preserving BC in that it is possible that the user has a torch::autograd::Function registered to their CPU key. - if the post-autograd kernels return Tensors that do not require grad, then we make them require_grad and install a WarnNotImplemented grad fn that warns in the backward pass. This is mildy BC-breaking (see next section). Test Plan: - bunch of new tests BC-Breaking Note ---------------- This PR adds a new fallback to the Autograd dispatch keys. It affects custom operators that do not have a kernel registered to the Autograd keys (e.g. AutogradCPU and AutogradCUDA). If the previous behavior was that the custom operator would return Tensors that do not require grad if the inputs do require grad, then this PR changes it so that all floating-point and complex returns do require grad. See the "Context" section above for how to get the old behavior. Differential Revision: D47408353 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105078 Approved by: https://github.com/soulitzer

Author

zou3519

Committer

pytorchmergebot

Parents

b4d91b1c

pytorch f03a8f05 - [reland] Deprecate registering autograd kernels at not an autograd key (#105078)

Commit

pytorch
f03a8f05 - [reland] Deprecate registering autograd kernels at not an autograd key (#105078)