[nnc] Refactor generation of intrinsics to reduce the amount of macro-hell (#51125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51125
The big pile of X-macros used for emitting (possibly vectorized)
intrinsics makes it **really** difficult to change that code in any systematic
way (which I'm about to do in a later diff).
We can factor most of what the macro does into a fairly simple function. There
are still macros but they're just a bunch of case/call helper/break
boilerplate.
ghstack-source-id: 120614089
Test Plan: `buck test mode/opt -c python.package_style=inplace //caffe2/benchmarks/cpp/tensorexpr:bench_ops`
Reviewed By: ZolotukhinM
Differential Revision: D26078384
fbshipit-source-id: 843548033f73d88c5d9a031c285b92f73be21390