Fix documentation for attention mask shape (#20850)
Summary:
Attention mask should be of shape `(L, S)` since it is added to `attn_output_weights`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20850
Differential Revision: D15495587
Pulled By: ezyang
fbshipit-source-id: 61d6801da5291df960daab273e874df28aedbf6e