pytorch
cf474a09 - Decompose torch.ops.higher_order.auto_functionalized in Inductor (#118673)

Commit View On GitHub

Commit

223 days ago

Decompose torch.ops.higher_order.auto_functionalized in Inductor (#118673) We'd like to get auto_functionalized to work with AOTInductor. To get there, we decompose `output = auto_functionalized(inplace_op, ...)` into its corresponding aten ops (clones + inplace_op) before the Inductor lowering phase. This decomposition must happen at the end of the Inductor FX passes because it introduces in-place operations. The pattern matcher's "replace this single node with multiple nodes" API isn't robust enough here. The problem is that `auto_functionalized` returns a single output (this output is a List), but the decomposition ends up returning the unpacked List (e.g. it may return two tensors). Previously, there was an assertion that this was not the case; I fixed up `replace_with_graph` to handle this. Future: Not all of the clones are necessary (e.g. if the input's last usage is this operator, then we don't need to clone it). We can add this logic later. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/118673 Approved by: https://github.com/oulgen

Author

zou3519

Committer

pytorchmergebot

Parents

8069b296

pytorch cf474a09 - Decompose torch.ops.higher_order.auto_functionalized in Inductor (#118673)

Commit

pytorch
cf474a09 - Decompose torch.ops.higher_order.auto_functionalized in Inductor (#118673)