Add newly_run and no_longer_run metrics to output yaml (#1509)
Summary:
This PR changes the contracts for what needs to be implemented. Previously, users must handle when the two metrics jsons do not have the same set of keys. Now, we record the mismatches in no_longer_run_in_treatment and newly_run_in_treatment and guarantee that the keys will definitely match when the jsons enter the user-defined run function.
This would output a yaml that looks like:
```
control_env:
pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
treatment_env:
pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
bisection: null
details:
BERT_pytorch, Adadelta, cuda, (pt2) default:
control: 0.009517530572008003
treatment: 0.009517530572008003
delta: 0.0
BERT_pytorch, Adadelta, cuda, default:
control: 0.008748639142140746
treatment: 0.008748639142140746
delta: 0.0
BERT_pytorch, Adadelta, cuda, (pt2) maximize:
control: 0.010465960879810155
treatment: 0.010465960879810155
delta: 0.0
...
no_longer_run_in_treatment:
BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize: 0.010405212640762329
BERT_pytorch, Adadelta, cuda, foreach, maximize: 0.009411881134534875
BERT_pytorch, Adagrad, cuda, (pt2) foreach, maximize: 0.03404413016202549
...
newly_run_in_treatment:
BERT_pytorch, Adadelta, cuda, (pt2) differentiable: 0.0033336214274944116
BERT_pytorch, Adadelta, cuda, differentiable: 0.017110475042136385
BERT_pytorch, Adagrad, cuda, (pt2) differentiable: 0.003775304475500477
BERT_pytorch, Adagrad, cuda, differentiable: 0.007527894619852304
BERT_pytorch, Adam, cuda, (pt2) amsgrad, maximize: 0.00928849776127291
...
```
A potential downside here is that users may WANT to handle the mismatches themselves, and this removes that knob for them. The alternative could be to allow users to just put in NaNs for the missing values and process the results from the YAML later. This would establish a different kind of contract that NaNs are used whenever the actual measurement is missing. I'm not sure which is better. The YAML would then look like:
```
control_env:
pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
treatment_env:
pytorch_git_version: 00891e96e8f2444785ae908c428514a726c27da8
bisection: null
details:
BERT_pytorch, Adadelta, cuda, (pt2) default:
control: 0.009517530572008003
treatment: 0.009517530572008003
delta: 0.0
BERT_pytorch, Adadelta, cuda, default:
control: 0.008748639142140746
treatment: 0.008748639142140746
delta: 0.0
BERT_pytorch, Adadelta, cuda, (pt2) maximize:
control: 0.010465960879810155
treatment: 0.010465960879810155
delta: 0.0
BERT_pytorch, Adadelta, cuda, (pt2) foreach, maximize:
control: 0.010405212640762329
treatment: NaN
delta: NaN
...
BERT_pytorch, Adadelta, cuda, (pt2) differentiable:
control: NaN
treatment: 0.0033336214274944116
delta: NaN
...
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1509
Reviewed By: xuzhao9
Differential Revision: D44518353
Pulled By: janeyx99
fbshipit-source-id: d701cf886a7126f0776644cc3ba6d7150441cc66