add nvml monitor (#1288)
Summary:
For machines without DCGM installed, we can use [NVIDIA NVML APIs](https://developer.nvidia.com/nvidia-management-library-nvml) to obtain GPU peak memory usage. The runtime version of NVML ships with the NVIDIA display driver, and the SDK provides the appropriate header, stub libraries and sample applications.
You can use `pip install nvidia-ml-py` to install the required package.
I add `--metrics-gpu-backend` to let users switch the metrics collection backend between `dcgm` and `nvml`.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1288
Reviewed By: xuzhao9
Differential Revision: D41131822
Pulled By: FindHao
fbshipit-source-id: 01efe08aa2bc0aa264c3dbc87d31bb4d5334c81c