reduce op bench binary size (#29496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29496
This diff reduces the binary size of op benchmark by avoiding creating all tests at once.
Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : long
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N2_K1_cpu
# Input: M: 8, N: 2, K: 1, device: cpu
Forward Execution Time (us) : 160.781
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N2_K8_cpu
# Input: M: 8, N: 2, K: 8, device: cpu
Forward Execution Time (us) : 158.941
Reviewed By: hl475
Differential Revision: D18412342
fbshipit-source-id: 5db647019ae8c2e4d6ab361b54b63cf88236b1ae