Add compilation time support (#1547)
Summary:
Measure warmup latency and use it to measure the PT2/inductor compilation time.
Fixes https://github.com/pytorch/benchmark/issues/1546
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1547
Test Plan:
```
$ python run.py resnet50 -d cuda -t train --torchdynamo inductor
Running train method from resnet50 on cuda in dynamo inductor mode with input batch size 32 and precision fp32.
GPU Time: 35.727 milliseconds
CPU Total Wall Time: 35.774 milliseconds
GPU 0 Peak Memory: 6.5234 GB
CPU Peak Memory: 1.2148 GB
Correctness: False
PT2 Compilation time: 27.919 seconds
```
Reviewed By: desertfire
Differential Revision: D44851940
Pulled By: xuzhao9
fbshipit-source-id: d185e90b3e595855df576dfc608d5413f74779b5