Support enough of closures to write autograd functions (#15411)
Summary:
This PR adds enough of the infra for supporting closures (inner script functions) in order to allow us to expression symbolic gradients using them. We do not actually ever run graphs that contain these closures. The symbolic_script infrastructure just extracts them out of the original forward graph and turns them into discrete forward/backward pairs. This cuts down on the type annotations necessary to write forward/backward pairs and aligns closely with the "differentiator" function approach to expression reverse-mode AD.
Example:
This code:
```
import torch
r = torch.jit.CompilationUnit(
'''
def mul_forward(self, other):
def backward(grad_output):
grad_self = (grad_output * other).sum_to_size(self.size())
grad_other = (grad_output * self).sum_to_size(other.size())
return grad_self, grad_other
return self * other, backward
''')
print(r.module.code)
```
Will produce this graph (pretty printed for clarity):
```
def mul_forward(self,
self: Tensor,
other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]:
backward = (self.__lambda, (other, self))
return (torch.mul(self, other), backward)
def __lambda(self,
context: Tuple[Tensor, Tensor],
grad_output: Tensor) -> Tuple[Tensor, Tensor]:
other, self, = context
grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self))
grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other))
return (grad_self, grad_other)
```
symbolic_script will then do some modifications to remove the unsuppored prim::Function node, yielding:
```
def mul_forward(self,
self: Tensor,
other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]:
return (torch.mul(self, other), (other, self))
def backward(self,
context: Tuple[Tensor, Tensor],
grad_output: Tensor) -> Tuple[Tensor, Tensor]:
other, self, = context
grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self))
grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other))
return (grad_self, grad_other)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15411
Differential Revision: D13523340
Pulled By: zdevito
fbshipit-source-id: 4d4a269460e595b16802c00ec55ae00e3e682d49