[PyTorch] Add Expanded call stack to nodes (#108426)
Summary:
To get a Node's call stack we currently loop on the InlinedCallStack graph and follow the "callee" chain. Since the node's inlined stack does not change we can optimize this but expanding the node's inlined stack once and reusing it. This is particularly useful when reading the node's stack from another process (e.g. BPF) as it simplified the memory traversal process.
The new data structure (NodeSourceInfo) only holds pointers to the function name and file name variables, and assumes these objects will be alive throughout the lifetime of the process.
Each Node has an extended attribute that has an index to a vector of stack frames `expanded_node_stacks_`
`node_stack_attr_symbol_` is only needed to make accessing the stack vector index attribute easier from BPF.
Test Plan:
- Performance Impact: The cost of expanding the call stack is between 500 - 1000 ns and happens only per instruction node at initialization time.
- Verified using BPF Program in subsequent diffs
Reviewed By: zdevito
Differential Revision: D46578700
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108426
Approved by: https://github.com/zdevito