add fx2trt diagnostics (and a framework) (#72374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72374
This change:
1. Collects various diagnostics along the code path of fx2trt (currently, lowering module graph/code, trt interpreter input module graph, each trt split module graphs)
2. Uploads them when an error happens
3. Provided a framework for easily collecting more diagnostics
The diagnostics framework has following features:
1. easy to use (see example)
2. safe to use - diagnostics itself should never throw exceptions into business logic, this includes errors when generate, writing, or uploading diagnostics
3. concurrency-safe, i.e., diagnostics collected on different execution contexts (threads, asyncio tasks/coroutines) are isolated from each other.
Example:
```
import torch.fx.experimental.diagnostics as diag
with diag.collect_when_fail():
...
# in places where you want to dump diagnostics:
diag.write("module.graph", str(module.graph))
diag.write("some_bytes", pickle.dumps(module.some_param))
# also supports retrieving data from a lambda. If it throws, it'll not impact
# business logic:
diag.write("some_data_2", lambda: this_might_throw())
...
some_code_that_throws() # this will trigger diagnostics to be uploaded
...
```
Note:
* `write()` will dump some diagnostics to tmp file
* `collect_when_fail()` will collect and zip all the dumped diagnostic files and uploads them.
* I already put `collect_when_fail()` in appropriate place, so we don't need to call it again. All we need to add will just be `write(...)`
Reviewed By: yinghai
Differential Revision: D33991890
fbshipit-source-id: ac736c57c0fdb3e6c5bf74089baf10c1ccf3631b
(cherry picked from commit 3dae23fd595eaa198177c89f2bcc68c18c5f778b)