Fix the dimension mismatch issues when running the BERT model (#23330)
Summary:
We found the following dimension mismatch issues when running the BERT model with the dynamic quantization:
```
Traceback (most recent call last):
File "bert.py", line 75, in <module>
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
result = self.forward(*input, **kwargs)
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 709, in forward
head_mask=head_mask)
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
result = self.forward(*input, **kwargs)
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 437, in forward
layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i])
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
result = self.forward(*input, **kwargs)
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 415, in forward
attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
result = self.forward(*input, **kwargs)
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 372, in forward
self_outputs = self.self(input_tensor, attention_mask, head_mask)
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
result = self.forward(*input, **kwargs)
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 303, in forward
query_layer = self.transpose_for_scores(mixed_query_layer)
File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 296, in transpose_for_scores
return x.permute(0, 2, 1, 3)
RuntimeError: number of dims don't match in permute
```
Before the quantization, the dimension of ```x``` in ```transpose_for_scores``` is ```[1, 14, 12, 64]```;
After the quantization, the dimension of ```x``` in ```transpose_for_scores``` is ```[14, 12, 64]```.
There is a dimension mismatch on the output of the ```torch.ops.quantized.fbgemm_linear_dynamic``` operators. The first dimension is missing, which cause the issues with the abvove permute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23330
ghstack-source-id: 88287092
Differential Revision: D16463334
fbshipit-source-id: 4bdb836d1df31ba7c0bd44e3339aabdc8b943ae1