Fix DDP incompatibility issue with nn.MultiheadAttention. (#26826)
Summary:
Fix issue https://github.com/pytorch/pytorch/issues/26698.
With different query/keys/value dimensions, `nn.MultiheadAttention` has DDP incompatibility issue because in that case `in_proj_weight` attribute is created but not used. Fix it and add a distributed unit test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26826
Differential Revision: D17583807
Pulled By: zhangguanheng66
fbshipit-source-id: c393584c331ed4f57ebaf2d4015ef04589c973f6