add a new flag to select machine for op benchmark (#29349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29349
This diff adds a new flag to pick cpu/gpu machines to run op benchmarks. The default is None which will try to run all support devices.
Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 124.283
...
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K128_cuda_bwdall
# Input: M: 64, N: 64, K: 128, device: cuda
Backward Execution Time (us) : 176.592
buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cpu
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 121.884
buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cuda
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 26.002
Reviewed By: hl475
Differential Revision: D18363942
fbshipit-source-id: fccd1fd09bcd6d7725e6fa4063559a27d9cc3065