pytorch
04b04c06 - [Compiled Autograd] Turn accumulate_grad into an op (#111271)

Commit
1 year ago
[Compiled Autograd] Turn accumulate_grad into an op (#111271) Rather than baking the behavior of `AccumulateGrad` nodes into the generated graph (either as `+=`, or as a return value of the graph). This creates a new `accumulate_grad_` dispatcher op that is included in the generated graph like: ``` def forward(self, inputs, sizes, hooks): getitem = inputs[0] getitem_1 = inputs[1] getitem_2 = inputs[2] getitem_3 = inputs[3] getitem_4 = inputs[4] getitem_5 = inputs[5] getitem_6 = inputs[6] getitem_7 = inputs[7] getitem_8 = inputs[8] getitem_9 = inputs[9]; inputs = None expand = torch.ops.aten.expand.default(getitem, [2, 4]); getitem = None threshold_backward = torch.ops.aten.threshold_backward.default(expand, getitem_1, 0); expand = getitem_1 = None t = torch.ops.aten.t.default(getitem_3); getitem_3 = None mm = torch.ops.aten.mm.default(threshold_backward, t); t = None t_1 = torch.ops.aten.t.default(threshold_backward) mm_1 = torch.ops.aten.mm.default(t_1, getitem_2); t_1 = getitem_2 = None t_2 = torch.ops.aten.t.default(mm_1); mm_1 = None sum_1 = torch.ops.aten.sum.dim_IntList(threshold_backward, [0], True); threshold_backward = None view = torch.ops.aten.view.default(sum_1, [4]); sum_1 = None t_3 = torch.ops.aten.t.default(t_2); t_2 = None accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_4, t_3); getitem_4 = t_3 = None threshold_backward_1 = torch.ops.aten.threshold_backward.default(mm, getitem_5, 0); mm = getitem_5 = None t_4 = torch.ops.aten.t.default(threshold_backward_1) mm_2 = torch.ops.aten.mm.default(t_4, getitem_6); t_4 = getitem_6 = None t_5 = torch.ops.aten.t.default(mm_2); mm_2 = None sum_2 = torch.ops.aten.sum.dim_IntList(threshold_backward_1, [0], True); threshold_backward_1 = None view_1 = torch.ops.aten.view.default(sum_2, [4]); sum_2 = None t_6 = torch.ops.aten.t.default(t_5); t_5 = None accumulate_grad__1 = torch.ops.inductor.accumulate_grad_.default(getitem_7, t_6); getitem_7 = t_6 = None accumulate_grad__2 = torch.ops.inductor.accumulate_grad_.default(getitem_8, view_1); getitem_8 = view_1 = None accumulate_grad__3 = torch.ops.inductor.accumulate_grad_.default(getitem_9, view); getitem_9 = view = None return [] ``` The motivation here is `AccumulateGrad` nodes are causing trouble in FSDP tracing, since FSDP is in-place resizing parameters and parameter storage in hooks. We will model this mutation in dynamo, but not during the initial compiled autograd capture. This allows us to bypass failing shape checks in the initial capture. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111271 Approved by: https://github.com/voznesenskym
Author
Committer
Parents
Loading