benchmark
f291d7be - Log cudagraphify timings to dynamo_timed (#143220)

Commit
325 days ago
Log cudagraphify timings to dynamo_timed (#143220) Summary: this adds some new dynamo_timed calls in cudagraph_trees, primarily with the aim to add cudagraph-related timing to scuba. Things to note: * Uses the changes in https://github.com/pytorch/pytorch/pull/141919 to log "runtime" entries * The logging for chromium/tlparse/scuba relies on us providing a compile_id since it's not available in the environment. A lot of the changes here are just passing around the compile_id * I believe the spirit of the scuba logging is to capture the overheads of `torch.compile`. Therefore, I'm not adding _every_ dynamo_timed to scuba. For example, "run_eager" is the first real execution of the inductor graph -- it's not cudagraph overhead, per se. Watch out for the two instances of `dynamo_compile_runtime_column_us="runtime_cudagraphify_time_us"`. Those are the spots I believe are _extra_ overhead we'd contribute to torch.compile. X-link: https://github.com/pytorch/pytorch/pull/143220 Approved by: https://github.com/eellison Reviewed By: jovianjaison Differential Revision: D70817601 fbshipit-source-id: 147ddccb31c15bd9ff13d86d862a0bd4012eec9b
Author
Parents
Loading