[PyTorch] Add Expanded call stack to nodes [Take 2] (#110229)
Summary:
Adding back D46578700 / PR https://github.com/pytorch/pytorch/pull/108426
Note: The changes were originally reverted due to memory regression, these changes are putting the code behind a gflag so it is only used by binaries that require expanded stack for BPF Profiling.
Original Diff comment:
To get a Node's call stack we currently loop on the InlinedCallStack graph and follow the "callee" chain. Since the node's inlined stack does not change we can optimize this but expanding the node's inlined stack once and reusing it. This is particularly useful when reading the node's stack from another process (e.g. BPF) as it simplified the memory traversal process.
The new data structure (NodeSourceInfo) only holds pointers to the function name and file name variables, and assumes these objects will be alive throughout the lifetime of the process.
Each Node has an extended attribute that has an index to a vector of stack frames expanded_node_stacks_
node_stack_attr_symbol_ is only needed to make accessing the stack vector index attribute easier from BPF.
Test Plan:
- Verified using BPF Program in subsequent diffs
- Perf testing for loading large model: P822455246
Differential Revision: D49565461
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110229
Approved by: https://github.com/zdevito