Add mask identifier for multiplexed src_mask/src_key_padding_mask in BT (#81947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81947
Transformer fastpath multiplexes two arguments, src_mask [seq_len x seq_len] and src_key_padding_mask [batch_size x seq_len], and later deduces the type based on mask shape.
In the event that batch_size == seq_len, any src_mask is wrongly interpreted as a src_key padding_mask. This is fixed by requiring a mask_type identifier be supplied whenever batch_size == seq_len.
Additionally, added support for src_mask in masked_softmax CPU path.
Test Plan: existing unit tests + new unit tests (batch_size == seq_len)
Differential Revision: D37932240
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81947
Approved by: https://github.com/zrphercule