Log individual Triton kernel compilation times to dynamo_compile (#147022)
Summary:
Gather the compilation time of individual triton kernels and log them to dynamo_compile:
* Time compilation in `_worker_compile_triton` and pass back to the main process and logged from `get_result()`.
* Added a way to track the "top N" (or N most-expensive compiles) in the metrics_context. I did this because I doubt we really care to capture potentially thousands of kernel compile times. That would be problematic for scuba logging anyway, so let's limit the number we track from the beginning. Arbitrarily chose 25 for now.
* Format the list of compile times as a json string before logging.
X-link: https://github.com/pytorch/pytorch/pull/147022
Approved by: https://github.com/jamesjwu
Reviewed By: wdvr
Differential Revision: D70512505
fbshipit-source-id: b0b26cea64a4d3f34e3386bf42ea203de46a6e3b