JIT Layernorm fusion (#18266)
Summary:
Partially fuse layer_norm by decomposing layer_norm into the batchnorm kernel that computes the stats, and then fusing the affine operations after the reduce operations, this is similar to the batchnorm fusion that apaszke did, it also only works in inference mode now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18266
Differential Revision: D14879877
Pulled By: wanchaol
fbshipit-source-id: 0197d8f2a17ec438d3e53f4c411d759c1ae81efe