Overwrite __setstate__ func in MultiheadAttention (#29001)
Summary:
Overwrite `__setstate__` func in nn.MultiheadAttention func and add `self._qkv_same_embed_dim` attribute in the `dict`. Current users should not be affected by the change.
The changes have been tested to load a MultiheadAttention model trained by PyTorch 1.1. If users have an old MultiheadAttention model, please use `torch.load` func to load the old model for inference under v1.4.0 and above.
```
import torch
model = torch.load('old_v1.1.0_MultiheadAttention.pt') # model works for torch 1.4
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29001
Differential Revision: D18257671
Pulled By: zhangguanheng66
fbshipit-source-id: fa41b85f6d53034dc9f445af60f2ad9636e9abf7