Improve MHA docs (#61977)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60831
Also clarifies the relationship between `embed_dim` and `num_heads` (see https://github.com/pytorch/pytorch/issues/60853 and https://github.com/pytorch/pytorch/issues/60445).
Formatting was overhauled to remove some redundancy between the input docs and shape docs; suggestions / comments welcome!
Link to rendered docs here: https://14912919-65600975-gh.circle-artifacts.com/0/docs/generated/torch.nn.MultiheadAttention.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61977
Reviewed By: bhosmer
Differential Revision: D29876884
Pulled By: jbschlosser
fbshipit-source-id: a3e82083219cc4f8245c021d309ad9d92bf39196