Distinguish compile time from run time (#1865)
Summary:
This PR teases out the compile time and run time to track two numbers, where previously, the metrics measured e2e compile+runtime.
SGD with foreach, momentum, CUDA on pt2 today takes ~2.36s on average. I anticipate the runtime to be close to that after this change still, whereas the compile time will be a separate much larger value.
Current results:
<img width="1369" alt="image" src="https://github.com/pytorch/benchmark/assets/31798555/16d366da-3361-4540-a2a2-bc22f51965e6">
The metrics dictionary used to have entries like:
```
{
"name": "optim",
"environ": {
"pytorch_git_version": "c527d0fadd27595ba2f98dd0f57aae5c56658d71"
},
"metrics": {
"resnet18, Adam, cuda, (pt2) default": 0.0018993107215413507,
"resnet18, Adam, cuda, default": 0.0010943956114351748,
"resnet18, Adam, cuda, (pt2) amsgrad, maximize": 0.002033790648736135,
"resnet18, Adam, cuda, amsgrad, maximize": 0.0013529009232297541,
"resnet18, Adam, cuda, (pt2) no_foreach": 0.005578947072434757,
...
}
}
```
But now, the keys will contain "compile_time" at the beginning if we're measuring compile time:
```
{
"name": "optim",
"environ": {
"pytorch_git_version": "a005f70a4284152e00c8f6603feaf4ab9636f6aa"
},
"metrics": {
"resnet18, SGD, cuda, (pt2) no_foreach": 0.0017500566132366657,
"resnet18, SGD, cuda, no_foreach": 0.0025729038193821907,
"resnet18, SGD, cuda, (pt2) foreach": 0.0017613966017961502,
...
"resnet18, SGD, cpu, foreach, momentum=0.9": 0.08240865767002106,
"compile_time, resnet18, SGD, cuda, (pt2) no_foreach": 14.877577589824796,
"compile_time, resnet18, SGD, cuda, (pt2) foreach": 0.6698574535548687,
"compile_time, resnet18, SGD, cuda, (pt2) foreach, momentum=0.9, nesterov": 0.32723781156043213,
...
"compile_time, resnet18, SGD, cpu, (pt2) foreach, momentum=0.9": 0.29321490600705147
}
}
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1865
Reviewed By: mlazos
Differential Revision: D49022308
Pulled By: janeyx99
fbshipit-source-id: 4a143071d232160b239efc03e04afba611a973af