Fix inductor <> ddp_optimizer issue (#108081)
@wconstab pointed out that inductor found a graph with 6 input mutations and only 1 output, and seemed to be (incorrectly) chopping off the first "6" outputs from the graph (even though there is only 1). It looks like this is because:
(1) AOTAutograd has special handling for input mutations in inference vs. training graphs. In a training graph, whenever AOTAutograd sees an input mutation, it will add an **extra** output to the graph, corresponding to the updated input (and then at runtime, it will grab the updated input, and perform the actual mutation outside of the graph).
In inference, AOTAutograd is smarter and can leave the input mutations directly in the graph for inductor to optimize (doing this in training is harder). In inference, AOTAutograd will **not** add any extra graph outputs for input mutations.
It looks like inductor was unconditionally assuming that input mutations counted as extra outputs in the graph, which is wrong for the inference case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108081
Approved by: https://github.com/wconstab