benchmark
3dd99cb3 - Fix duration logging for dynamo_compile (#151749)

Commit

1 year ago

Fix duration logging for dynamo_compile (#151749) Summary: There are a few issues I'm solving:. 1. It's too hard to measure total pt2 overhead using the dynamo_compile table because users need to know the columns representing all the top-level events (dynamo_cumulative_compile_time_us, etc.). Instead, let's populate the existing duration_us field for all top-level events. The complication is that runtime events in particular (Triton autotuning, cudagraphify) can be collapsed into a single row, with gaps in between, so we can't simply use `end_time - start_time` in all cases. Instead, we'll sum durations for all outer events when updating the compile-time or runtime metrics context. Introduce a 'depth' counter in TLS to track the nesting of CompilationMetrics events. 2. The existing implementation relies on callers of dynamo_timed to specify whether the event is a runtime or compile-time event. That doesn't work because some methods can be called in both situations, e.g., `CachingAutotuner.benchmark_all_configs`. For example `TORCHINDUCTOR_BENCHMARK_FUSION=1` enables benchmarking during compile-time. Instead, we can figure out automatically whether we're measuring a compile-time or runtime event and log accordingling. 3. If `log_compilation_events` were to throw an exception, we'd fail to clear the aggregated counters for runtime logs and they could be attributed to the wrong compile ID. I didn't actually find evidence of this in practice, but I added exception handling for extra safety. X-link: https://github.com/pytorch/pytorch/pull/151749 Approved by: https://github.com/Skylion007 Reviewed By: wdvr Differential Revision: D73440137 fbshipit-source-id: 7f176a9ffb4a87bc7176cf737f4bed04a5879a34

Author

masnesral

Committer

facebook-github-bot

Parents

92a05f1c

benchmark 3dd99cb3 - Fix duration logging for dynamo_compile (#151749)

benchmark
3dd99cb3 - Fix duration logging for dynamo_compile (#151749)