Add help for run_benchmark (#2361)
Summary:
Show more help messages for Tritonbench
```
$ python run_benchmark.py triton --help
usage: run_benchmark.py [-h] [--op OP] [--mode {fwd,bwd,fwd_bwd}] [--bwd] [--fwd_bwd] [--device DEVICE] [--warmup WARMUP] [--iter ITER] [--csv] [--dump-csv] [--skip-print] [--plot] [--ci] [--metrics METRICS] [--only ONLY] [--baseline BASELINE]
[--num-inputs NUM_INPUTS] [--keep-going] [--input-id INPUT_ID] [--test-only] [--dump-ir]
options:
-h, --help show this help message and exit
--op OP Operator to benchmark.
--mode {fwd,bwd,fwd_bwd}
Test mode (fwd, bwd, or fwd_bwd).
--bwd Run backward pass.
--fwd_bwd Run both forward and backward pass.
--device DEVICE Device to benchmark.
--warmup WARMUP Num of warmup runs for reach benchmark run.
--iter ITER Num of reps for each benchmark run.
--csv Print result as csv.
--dump-csv Dump result as csv.
--skip-print Skip printing result.
--plot Plot the result.
--ci Run in the CI mode.
--metrics METRICS Metrics to collect, split with comma. E.g., --metrics latency,tflops,speedup.
--only ONLY Specify one or multiple operator implementations to run.
--baseline BASELINE Override default baseline.
--num-inputs NUM_INPUTS
Number of example inputs.
--keep-going
--input-id INPUT_ID Specify the start input id to run. For example, --input-id 0 runs only the first available input sample.When used together like --input-id <X> --num-inputs <Y>, start from the input id <X> and run <Y> different inputs.
--test-only Run this under test mode, potentially skipping expensive steps like autotuning.
--dump-ir Dump Triton IR
```
```
$ python run_benchmark.py triton --op gemm --num-inputs 1 --only triton_tutorial_matmul
(M, N, K) triton_tutorial_matmul-latency
--------------- --------------------------------
(256, 256, 256) 0.0033702
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/2361
Reviewed By: jananisriram
Differential Revision: D59374656
Pulled By: xuzhao9
fbshipit-source-id: 139f865895d7550a3475a1a8b4bed037a9ecc769