Replace loss function in BERT_LOSS with SoftmaxCrossEntropyLoss. (#4509)
* Replace loss function in BERT_LOSS with SoftmaxCrossEntropyLoss.
* Update BERT loss function with correct logit shapes for softmax cross entropy loss.
* fix test and PR comments.