Enable the intra-op parallelism for layer norm (#28464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28464
We would like to enable the intra-op parallelism for layer norm. This will be mapped to the parallel performance win for the BERT/RoBERTa model.
Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm"
Reviewed By: BIT-silence
Differential Revision: D18063407
fbshipit-source-id: c116e744d78ea50b3aadf2e9a819e5b876a944bf