Fix broken unit test (#1743)
Summary:
Error:
```
$ python run.py hf_T5_large -d cuda -t eval --accuracy
fp64 golden ref were not generated for hf_T5_large. Setting accuracy check to cosine
CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Traceback (most recent call last):
File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 476, in check_accuracy
correct_result = run_n_iterations(
File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 375, in run_n_iterations
_model_iter_fn(mod, inputs, contexts, optimizer, collect_outputs=False)
File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 373, in _model_iter_fn
forward_pass(mod, inputs, contexts, collect_outputs)
File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 354, in forward_pass
return mod(*inputs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/benchmark/torchbenchmark/util/framework/huggingface/model_factory.py", line 46, in forward
return self.model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1683, in forward
encoder_outputs = self.encoder(
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1090, in forward
layer_outputs = layer_module(
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 693, in forward
self_attention_outputs = self.layer[0](
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 600, in forward
attention_output = self.SelfAttention(
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 519, in forward
query_states = shape(self.q(hidden_states)) # (batch_size, n_heads, seq_length, dim_per_head)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
return forward_call(*args, **kwargs)
File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
```
Still not sure what is the root cause, updating the batch size to "1" fixes it though.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1743
Reviewed By: davidberard98
Differential Revision: D46975729
Pulled By: xuzhao9
fbshipit-source-id: 80a367b2bd00e76ddaecfc62a7078baa14b4526a