Fix mixed fused layer norm to mimick nn.LayerNorm for torch>1.11 (#281)
* If pytorch>=1.11 available we can use nn.LayerNorm instead of MixedLayerNorm
* Add MixedFusedLayerNorm fix
* Turns out LayerNorm for bf16 is slower using torch==1.11
* Test for LayerNorm