[ONNX][Diagnostics] Speed up 'python_call_stack' by 'traceback' (#96348)
`inspect.stack()` retrieves all stacktraces, and is not performant. `inspect.stack(0)`
speeds up the call greatly, but loses line snippet.
Rewrite with `traceback.extract_stack` which is better in both regards.
Speeds up `export` call in `test_gpt2_tiny` from ~30s to ~4s under profiling.
Before
```log
│...├─ 30.794 export_after_normalizing_args_and_kwargs <@beartype(torch.onnx._internal.fx.exporter.export_after_normalizing_args_and_kwargs) at 0x7f815cba0700>:1
│...│ └─ 30.794 export_after_normalizing_args_and_kwargs torch/onnx/_internal/fx/exporter.py:580
```
After
```log
│...├─ 4.427 export_after_normalizing_args_and_kwargs <@beartype(torch.onnx._internal.fx.exporter.export_after_normalizing_args_and_kwargs) at 0x7fd8281b3700>:1
│...│ └─ 4.427 export_after_normalizing_args_and_kwargs torch/onnx/_internal/fx/exporter.py:580
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96348
Approved by: https://github.com/titaiwangms, https://github.com/justinchuby