enable peak memory measurement by default (#1392)
Summary:
The CPU and GPU peak memory measurements will be enabled by default.
The output is like the following:
```
$ python3 run.py -d cuda -t train BERT_pytorch
Running train method from BERT_pytorch on cuda in eager mode with input batch size 16.
GPU Time: 107.128 milliseconds
CPU Total Wall Time: 107.156 milliseconds
GPU 0 Peak Memory: 6.2205 GB
CPU Peak Memory: 2.8018 GB
```
The dependent package could be installed by the following command.
```
pip install pynvml
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1392
Reviewed By: aaronenyeshi
Differential Revision: D42929574
Pulled By: xuzhao9
fbshipit-source-id: 431c8692bdb142cee3e515d64630323d512e7665