Use SLEEF functions for NEON vectors on macOS ARM64 (#70354)
Summary:
We noticed that on M1 Macs Tranformer network profiles are dominated by scalar `exp` and `erff` functions (for softmax and GELU).
The NEON `Vectorized<float>` implementation does not use SLEEF functions in order to compile on mobile platforms. However, SLEEF is already compiled on macOS ARM64 and is safe to use there. This change adds another implementation of `Vectorized<float>` that uses SLEEF functions. This implementation is only used on macOS ARM64.
This change speeds up e.g. prediction of spaCy transformer models by 20% on M1 Macs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70354
Reviewed By: albanD
Differential Revision: D33659540
Pulled By: kimishpatel
fbshipit-source-id: b8f02a61321873fc60778190a005c466c7d0cc0c
(cherry picked from commit 71286a207cefaae5a0be4eb3d618b55366ee4861)