[te] Add a benchmark harness (#45875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45875
Adds a googlebenchmark harness for perf testing programs generated by
tensorexpr, sans any pytorch wrappings (for python-level benchmarks of
tensorexpr, see benchmarks/tensorexpr).
Currently there's a harness for gemm that sets up the problem using torch (and
also measures the perf of a torch::mm to give a baseline).
Right now there's just an unoptimized implementation that is expected to be not
very fast. More optimized versions are coming.
Sample output from my dev box:
```
Run on (48 X 2501 MHz CPU s)
CPU Caches:
L1 Data 32K (x24)
L1 Instruction 32K (x24)
L2 Unified 256K (x24)
L3 Unified 30720K (x2)
--------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------
Gemm/Torch/128/128/128 73405 ns 73403 ns 8614 GFLOPS=57.1411G/s
Gemm/TensorExprNoopt/128/128/128 3073003 ns 3072808 ns 229 GFLOPS=1.36497G/s
```
Test Plan: Imported from OSS
Reviewed By: SplitInfinity
Differential Revision: D24142403
Pulled By: bertmaher
fbshipit-source-id: 3354aaa56868a43a553acd1ad9a192f28d8e3597