Cat benchmark: use mobile feed tensor shapes and torch.cat out-variant (#50778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50778
- use tensor shapes from ctr_mobilefeed merge net
- use pt cat out-variant for a fairer comparison otherwise benchmark includes time to construct result tensor
Test Plan:
turbo off, devbig machine
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/c2/concat_test.par --tag_filter=static_runtime
```
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : static_runtime
# Benchmarking Caffe2: concat
# Name: concat_sizes(1,40)_N5_axis1_add_axis0_devicecpu_dtypefloat
# Input: sizes: (1, 40), N: 5, axis: 1, add_axis: 0, device: cpu, dtype: float
Forward Execution Time (us) : 0.619
# Benchmarking Caffe2: concat
# Name: concat_sizes[(1,160),(1,14)]_N-1_axis1_add_axis0_devicecpu_dtypefloat
# Input: sizes: [(1, 160), (1, 14)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float
Forward Execution Time (us) : 0.369
# Benchmarking Caffe2: concat
# Name: concat_sizes[(1,20,40),(1,4,40),(1,5,40)]_N-1_axis1_add_axis0_devicecpu_dtypefloat
# Input: sizes: [(1, 20, 40), (1, 4, 40), (1, 5, 40)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float
Forward Execution Time (us) : 0.590
# Benchmarking Caffe2: concat
# Name: concat_sizes[(1,580),(1,174)]_N-1_axis1_add_axis0_devicecpu_dtypefloat
# Input: sizes: [(1, 580), (1, 174)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float
Forward Execution Time (us) : 0.412
# Benchmarking Caffe2: concat
# Name: concat_sizes(20,40)_N5_axis1_add_axis0_devicecpu_dtypefloat
# Input: sizes: (20, 40), N: 5, axis: 1, add_axis: 0, device: cpu, dtype: float
Forward Execution Time (us) : 2.464
# Benchmarking Caffe2: concat
# Name: concat_sizes[(20,160),(20,14)]_N-1_axis1_add_axis0_devicecpu_dtypefloat
# Input: sizes: [(20, 160), (20, 14)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float
Forward Execution Time (us) : 1.652
# Benchmarking Caffe2: concat
# Name: concat_sizes[(20,20,40),(20,4,40),(20,5,40)]_N-1_axis1_add_axis0_devicecpu_dtypefloat
# Input: sizes: [(20, 20, 40), (20, 4, 40), (20, 5, 40)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float
Forward Execution Time (us) : 9.312
# Benchmarking Caffe2: concat
# Name: concat_sizes[(20,580),(20,174)]_N-1_axis1_add_axis0_devicecpu_dtypefloat
# Input: sizes: [(20, 580), (20, 174)], N: -1, axis: 1, add_axis: 0, device: cpu, dtype: float
Forward Execution Time (us) : 6.532
```
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/pt/cat_test.par --tag_filter=static_runtime
```
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : static_runtime
# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[(1,160),(1,14)]_N-1_dim1_cpu
# Input: sizes: [(1, 160), (1, 14)], N: -1, dim: 1, device: cpu
Forward Execution Time (us) : 3.313
# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[(1,20,40),(1,4,40),(1,5,40)]_N-1_dim1_cpu
# Input: sizes: [(1, 20, 40), (1, 4, 40), (1, 5, 40)], N: -1, dim: 1, device: cpu
Forward Execution Time (us) : 3.680
# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[(1,580),(1,174)]_N-1_dim1_cpu
# Input: sizes: [(1, 580), (1, 174)], N: -1, dim: 1, device: cpu
Forward Execution Time (us) : 3.452
# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[(20,160),(20,14)]_N-1_dim1_cpu
# Input: sizes: [(20, 160), (20, 14)], N: -1, dim: 1, device: cpu
Forward Execution Time (us) : 4.653
# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[(20,20,40),(20,4,40),(20,5,40)]_N-1_dim1_cpu
# Input: sizes: [(20, 20, 40), (20, 4, 40), (20, 5, 40)], N: -1, dim: 1, device: cpu
Forward Execution Time (us) : 7.364
# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[(20,580),(20,174)]_N-1_dim1_cpu
# Input: sizes: [(20, 580), (20, 174)], N: -1, dim: 1, device: cpu
Forward Execution Time (us) : 7.055
```
Reviewed By: hlu1
Differential Revision: D25839036
fbshipit-source-id: 7a6a234f41dfcc56246a80141fe0c84f769a5a85
Author
Marat Subkhankulov