Enable flops for torchvision models (#748)
Summary:
Test run on CPU:
```
python run.py mnasnet1_0 --flops
```
Result:
```
CPU Total Wall Time: 931.076 milliseconds
FLOPS: 0.0224 TFLOPs per second
```
Test run on GPU:
```
python run.py resnet50 -d cuda -t eval --flops
```
Result:
```
GPU Time: 31.228 milliseconds
CPU Dispatch Time: 8.527 milliseconds
CPU Total Wall Time: 31.234 milliseconds
FLOPS: 8.4247 TFLOPs per second
```
Note: the `--flops` option only works for eval tests, because `fvcore` flops counter doesn't count backwards computation.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/748
Reviewed By: erichan1
Differential Revision: D34161945
Pulled By: xuzhao9
fbshipit-source-id: 20fdd6e5af72760840411ef7677107520427d00c