Add the test_bench userbenchmark (#2052)
Summary:
The plan is to use the `test_bench` userbenchmark to deprecate `test_bench.py`.
It supports:
1. Running multiple models, each model in its subprocess.
2. Running with extra args such as `--torchdynamo inductor`.
3. `--debug` option: save the output of subprocess in the `output-%Y%m%d%H%M%S` directory.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/2052
Test Plan:
```
$ python run_benchmark.py test_bench llama_v2_7b_16h -d cuda -t eval --accuracy --debug
Running TorchBenchModelConfig(name='llama_v2_7b_16h', test='eval', device='cuda', batch_size=None, extra_args=['--accuracy'], extra_env=None, output_dir=PosixPath("/data/users/xzhao9/git/benchmark/.userbenchmark/test_bench/output-20231122193846/model=llama_v2_7b_16h, test=eval, device=cuda, bs=None, extra_args=['--accuracy']")) ...[done]
```
The error log is saved to the log file: `test_bench/output-20231122193846/model=llama_v2_7b_16h, test=eval, device=cuda, bs=None, extra_args=['--accuracy']/stderr.log`:
```
File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 424, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 321, in forward
query_states = self.q_proj(hidden_states)
File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
```
Reviewed By: aaronenyeshi
Differential Revision: D51542761
Pulled By: xuzhao9
fbshipit-source-id: acf0616c791a72c3d7f015a1b77cba4a017d915d