update compile time benchmarks to dump compile times to stdout and csv (#145447)
Summary:
```python
# inductor.csv
dev,name,batch_size,accuracy,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips,compilation_latency
cuda,cait_m36_384,8,pass,2510,1,0,0,0,0,0,87.705186
```
```python
loading model: 0it [01:27, ?it/s]
cuda eval cait_m36_384
Compilation time (from dynamo_timed): 87.705186276 # <----------------
pass
TIMING: _recursive_pre_grad_passes:0.11023 pad_mm_benchmark:0.50341 _recursive_joint_graph_passes:3.88557 _recursive_post_grad_passes:6.71182 async_compile.wait:4.16914 code_gen:17.57586 inductor_compile:42.55769 backend_compile:72.47122 entire_frame_compile:87.70519 gc:0.00112 total_wall_time:87.70519
STATS: call_* op count: 2510 | FakeTensorMode.__torch_dispatch__:101743 | FakeTensor.__torch_dispatch__:12959 | ProxyTorchDispatchMode.__torch_dispatch__:41079
Dynamo produced 1 graphs covering 2510 ops with 0 graph breaks (0 unique)
```
X-link: https://github.com/pytorch/pytorch/pull/145447
Approved by: https://github.com/ezyang
Reviewed By: izaitsevfb
Differential Revision: D68570811
fbshipit-source-id: c7101c08a3435fa3567bce505f73eda86d056d63