pytorch
fed0ec30 - add fx2trt diagnostics (and a framework) (#72374)

Commit
2 years ago
add fx2trt diagnostics (and a framework) (#72374) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72374 This change: 1. Collects various diagnostics along the code path of fx2trt (currently, lowering module graph/code, trt interpreter input module graph, each trt split module graphs) 2. Uploads them when an error happens 3. Provided a framework for easily collecting more diagnostics The diagnostics framework has following features: 1. easy to use (see example) 2. safe to use - diagnostics itself should never throw exceptions into business logic, this includes errors when generate, writing, or uploading diagnostics 3. concurrency-safe, i.e., diagnostics collected on different execution contexts (threads, asyncio tasks/coroutines) are isolated from each other. Example: ``` import torch.fx.experimental.diagnostics as diag with diag.collect_when_fail(): ... # in places where you want to dump diagnostics: diag.write("module.graph", str(module.graph)) diag.write("some_bytes", pickle.dumps(module.some_param)) # also supports retrieving data from a lambda. If it throws, it'll not impact # business logic: diag.write("some_data_2", lambda: this_might_throw()) ... some_code_that_throws() # this will trigger diagnostics to be uploaded ... ``` Note: * `write()` will dump some diagnostics to tmp file * `collect_when_fail()` will collect and zip all the dumped diagnostic files and uploads them. * I already put `collect_when_fail()` in appropriate place, so we don't need to call it again. All we need to add will just be `write(...)` Reviewed By: yinghai Differential Revision: D33991890 fbshipit-source-id: ac736c57c0fdb3e6c5bf74089baf10c1ccf3631b (cherry picked from commit 3dae23fd595eaa198177c89f2bcc68c18c5f778b)
Author
Committer
Parents
Loading