benchmark
b5e0b83c - Add support for NVIDIA DCGM based FLOPs/sec calculation (#929)

Commit
3 years ago
Add support for NVIDIA DCGM based FLOPs/sec calculation (#929) Summary: Add choices for `--flops`: - `--flops model`: this option will use an estimation method to calculate the flops. - `--flops dcgm`: this option will use NVIDIA DCGM API to collect hardware counters for FP32 computations, and calculate the flops. ## Dependency [NVIDIA DCGM](https://developer.nvidia.com/dcgm) is required by this function and could be easily installed following the [official installation guide](https://docs.nvidia.com/datacenter/dcgm/latest/dcgm-user-guide/getting-started.html#installation). `numba` is the required dependent package which could be installed by `pip install numba`. ## Run For example, you can run the following command to get the flops of resnet50. ``` python run.py -d cuda --flops dcgm resnet50 ``` The last part of the output is supposed to be like the following. ``` GPU Time: 12.097 milliseconds CPU Total Wall Time: 12.137 milliseconds FLOPS: 1.9684 TFLOPs per second Correctness: Correct ``` Pull Request resolved: https://github.com/pytorch/benchmark/pull/929 Reviewed By: xuzhao9 Differential Revision: D36644440 Pulled By: FindHao fbshipit-source-id: a927bf891ccc0b590af69e3cf5d062440ff371b6
Author
Parents
Loading