Liqun/bert pretrain tb (#5377)

Commit

5 years ago

Liqun/bert pretrain tb (#5377) * add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>