[nnc] Vectorize bitwise ops (#51492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51492
We missed these originally. This helps vectorize log_fast.
ghstack-source-id: 120783427
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench
```
This might have made bench_approx faster but it could be noise.
Before:
```
----------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------
log_nnc_fast/64 108 ns 108 ns 5576102 log/s=590.91M/s
log_nnc_fast/512 569 ns 569 ns 1230258 log/s=899.961M/s
log_nnc_fast/8192 8047 ns 8046 ns 89715 log/s=1018.08M/s
log_nnc_fast/32768 31066 ns 31065 ns 22368 log/s=1054.81M/s
logit_nnc_fast/64 149 ns 149 ns 4851520 logit/s=428.646M/s
logit_nnc_fast/512 980 ns 979 ns 712033 logit/s=522.742M/s
logit_nnc_fast/8192 13326 ns 13325 ns 51916 logit/s=614.805M/s
logit_nnc_fast/32768 54743 ns 54739 ns 12844 logit/s=598.624M/s
```
After:
```
----------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------
log_nnc_fast/64 100 ns 100 ns 7012963 log/s=640.588M/s
log_nnc_fast/512 496 ns 496 ns 1415357 log/s=1032.26M/s
log_nnc_fast/8192 7600 ns 7595 ns 88258 log/s=1078.62M/s
log_nnc_fast/32768 30300 ns 30298 ns 22442 log/s=1081.52M/s
logit_nnc_fast/64 152 ns 152 ns 4505712 logit/s=420.279M/s
logit_nnc_fast/512 816 ns 816 ns 873834 logit/s=627.267M/s
logit_nnc_fast/8192 12090 ns 12088 ns 58234 logit/s=677.675M/s
logit_nnc_fast/32768 51576 ns 51531 ns 14645 logit/s=635.888M/s
```
Reviewed By: bwasti
Differential Revision: D26155792
fbshipit-source-id: 16724b419c944aa7d4389ae85838018455a5605f