Add support for masked_softmax when softmax_elements > 1024 & corresponding unit tests (#69924)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69924
Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax
Reviewed By: ngimel
Differential Revision: D32819181
fbshipit-source-id: 6838a11d3554ec8e1bd48f1c2c7b1ee3a4680995