Fix issue in softmax.cu with transformer error when mask seqlen > 1024 (#83639)
Fixes #83142
Adds
- test to catch this issue.
- fix to softmax.cu that broadcasts src_key_padding_mask to regular attention_mask shape
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83639
Approved by: https://github.com/ngimel