[nnc] Expose vectorized math functions to jit fuser. (#51190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51190
We want to be able to call fast vectorized functions from sleef inside
the jit fuser, but only when they're supported by the host processor. Enabling
this feature has two parts:
1. Record the addresses of the symbols, assuming they're defined. Sleef only
defines vectorized functions if AVX is enabled, so we need to define __AVX__ to
get access to those symbols. We don't actually need to compile anything with
AVX; the symbols just have to be present.
2. Before emitting a call to sleef, check if the host processor actually has
AVX. LLVM makes this easy since we can just check the target feature string
for "+avx".
ghstack-source-id: 120614086
Test Plan:
```
buck run mode -c python.package_style=inplace //caffe2/benchmarks/cpp/tensorexpr:bench_ops
```
shows a significant speedup on most math functions (esp sigmoid, which goes
from 13% of ATen speed to parity).
Reviewed By: navahgar
Differential Revision: D26096170
fbshipit-source-id: b7268a50d73f8dc03b4db11cc38b8402387eed2d