pytorch
c77fc2ee - [nnc] Vectorize bitwise ops (#51492)

Commit
3 years ago
[nnc] Vectorize bitwise ops (#51492) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51492 We missed these originally. This helps vectorize log_fast. ghstack-source-id: 120783427 Test Plan: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench ``` This might have made bench_approx faster but it could be noise. Before: ``` ---------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------- log_nnc_fast/64 108 ns 108 ns 5576102 log/s=590.91M/s log_nnc_fast/512 569 ns 569 ns 1230258 log/s=899.961M/s log_nnc_fast/8192 8047 ns 8046 ns 89715 log/s=1018.08M/s log_nnc_fast/32768 31066 ns 31065 ns 22368 log/s=1054.81M/s logit_nnc_fast/64 149 ns 149 ns 4851520 logit/s=428.646M/s logit_nnc_fast/512 980 ns 979 ns 712033 logit/s=522.742M/s logit_nnc_fast/8192 13326 ns 13325 ns 51916 logit/s=614.805M/s logit_nnc_fast/32768 54743 ns 54739 ns 12844 logit/s=598.624M/s ``` After: ``` ---------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------- log_nnc_fast/64 100 ns 100 ns 7012963 log/s=640.588M/s log_nnc_fast/512 496 ns 496 ns 1415357 log/s=1032.26M/s log_nnc_fast/8192 7600 ns 7595 ns 88258 log/s=1078.62M/s log_nnc_fast/32768 30300 ns 30298 ns 22442 log/s=1081.52M/s logit_nnc_fast/64 152 ns 152 ns 4505712 logit/s=420.279M/s logit_nnc_fast/512 816 ns 816 ns 873834 logit/s=627.267M/s logit_nnc_fast/8192 12090 ns 12088 ns 58234 logit/s=677.675M/s logit_nnc_fast/32768 51576 ns 51531 ns 14645 logit/s=635.888M/s ``` Reviewed By: bwasti Differential Revision: D26155792 fbshipit-source-id: 16724b419c944aa7d4389ae85838018455a5605f
Author
Parents
Loading