Support MultiheadedAttention module (#28555)
Summary:
This makes MultiheadedAttention TorchScript compatible
It also breaks BC-compatibility for old models that do not have `_qkv_same_embed_dim` as an attribute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28555
Pulled By: driazati
Differential Revision: D18124746
fbshipit-source-id: 5c5042fc6fc0e557db859a8ae05174cba5fce6a9