Vectorize non-persistent Softmax (#38557)
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/36485 with bug fix & enhanced testing.
Moved `test_softmax_backward` -> `test_softmax_results`, check fprop & bgrad against CPU implementation for all cases.
\cc ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38557
Differential Revision: D21620805
Pulled By: ngimel
fbshipit-source-id: 4f736b3e59f79142e1b982eb643c592dedcbe111