add a note wrt torch.nn.functional.scaled_dot_product_attention (#120668)
followup change of https://github.com/pytorch/pytorch/pull/120565
- Added a note in the transformer class pointing out the mask definition is opposite to that of :attr:`attn_mask` in
torch.nn.functional.scaled_dot_product_attention.
@mikaylagawarecki
Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120668
Approved by: https://github.com/mikaylagawarecki