Extend nn.Transformer to support BERT (gelu) (#24181)
Summary:
To use transformer for BERT, we need `gelu` activation. https://github.com/pytorch/pytorch/issues/24177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24181
Differential Revision: D16790327
Pulled By: zhangguanheng66
fbshipit-source-id: b4eed21ad1a4d753bb090fa7fd78886714a6d761