change op benchmark forward_only flag (#28967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28967
Change forward_only flag to take True or False so it should be integrated with PEP.
Test Plan:
```
[mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only True --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 152.489
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 236.608
[mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only False --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 147.174
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 253.437
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 1044.082
Reviewed By: hl475
Differential Revision: D18247416
fbshipit-source-id: 1c6cff1ac98233d4f0ca298e0cb4a0d3466e5834