[Profiler] Memory profiler part 5: Data flow graph (#87006)
The semantic meaning of a Tensor is tightly coupled to its lineage. The data flow graph allows us to identify temporary Tensors, masks, inputs, activations, and more. However one important nuance is that Tensors must be versioned; operations which mutate their inputs can also change the semantic meaning of said inputs.
It is challenging to assemble a complete picture of the data flow in a PyTorch model because ops can, and often do, recursively call into other ops. For the purpose of memory profiling this is an implementation detail, so instead we traverse the op tree to identify top level ops and allocations and then coalesce their children, folding inputs and outputs into the top level Node.
Differential Revision: [D40220391](https://our.internmc.facebook.com/intern/diff/D40220391/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87006
Approved by: https://github.com/chaekit