Add the intra-op parallelism for equal operator (#28810)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28810
Similar to https://github.com/pytorch/pytorch/pull/28464 and https://github.com/pytorch/pytorch/pull/28477, we would like to enable the intra-op parallelism for layer norm. This will be mapped to the parallel performance win for the BERT/RoBERTa model.
Test Plan: CI
Differential Revision: D18165752
fbshipit-source-id: 354cede4c36893acbd69711f49aa6a51dc94397f