Add support for TxT mask layout for masked_softmax in BetterTransformer (#77607)
Summary: Expand mask to BxHxDxD when mask is DxD layout
Test Plan: buck build mode/opt -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/opt/gen/caffe2/test/nn\#binary.par -r masked_softmax_DxD
Differential Revision: D36428170
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77607
Approved by: https://github.com/cpuhrsch