[PyTorch] Add native fast path for transformer encoder inference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75809
The current PyTorch multi-head attention and transformer
implementations are slow. This should speed them up for inference.
Differential Revision: [D35239925](https://our.internmc.facebook.com/intern/diff/D35239925/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35239925/)!
Approved by: https://github.com/ezyang