Add full reduction benchmark. (#50057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50057
As part of the effort to calibrate TE reduction performance, adding a full reduction benchmark.
Also add a "skip_input_transformation" option.
Fixed other reduction benchmarks to accept specific benchmarks that was listed.
Test plans:
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full_fwd_cpu_16777216_s1
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce_full_fwd_cpu_16777216_s0
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_inner
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_inner_fwd_cpu_640_524288
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_outer
* python -m benchmarks.tensorexpr --device=cpu --mode=fwd reduce2d_outer_fwd_cpu_640_524288
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D25774138
Pulled By: zheng-xq
fbshipit-source-id: fd4598e5c29991be476e42235a059e8021d4f083