Transformer kernel/fix layer norm (#1587)
* fixing the softmax masking when using triangular masking
* fix a bug in the the layernorm backward kernels
* revert back some changes & remove debug code
* change the constants to a macro
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>