[caffe2] Don't call TensorImpl::size() in dim32() (#53852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53852
dim32() requires that its argument is in range, so we can use the faster `TensorImpl::sizes()` call instead.
ghstack-source-id: 123784862
Test Plan:
Ran MergeNet AdIndexer benchmark under perf stat.
Before:
```
Performance counter stats for 'scripts/bwasti/static_runtime/run.sh' (5 runs):
7,008.70 msec task-clock # 0.997 CPUs utilized ( +- 0.25% )
4,203 context-switches # 0.600 K/sec ( +- 14.71% )
3 cpu-migrations # 0.000 K/sec
93,896 page-faults # 0.013 M/sec ( +- 0.80% )
13,869,719,763 cycles # 1.979 GHz ( +- 0.23% ) (50.05%)
27,561,765,867 instructions # 1.99 insn per cycle ( +- 0.06% ) (50.04%)
4,288,245,412 branches # 611.846 M/sec ( +- 0.05% ) (50.01%)
19,633,433 branch-misses # 0.46% of all branches ( +- 0.83% ) (50.01%)
# Table of individual measurements:
7.0670 (+0.0379) #
6.9897 (-0.0394) #
7.0203 (-0.0088) #
6.9829 (-0.0462) #
7.0856 (+0.0565) #
# Final result:
7.0291 +- 0.0205 seconds time elapsed ( +- 0.29% )
```
After:
```
Performance counter stats for 'scripts/bwasti/static_runtime/run.sh' (5 runs):
6,935.61 msec task-clock # 0.997 CPUs utilized ( +- 0.47% )
2,913 context-switches # 0.420 K/sec ( +- 15.25% )
3 cpu-migrations # 0.000 K/sec
92,628 page-faults # 0.013 M/sec ( +- 0.50% )
13,724,940,495 cycles # 1.979 GHz ( +- 0.47% ) (50.01%)
27,226,217,974 instructions # 1.98 insn per cycle ( +- 0.02% ) (50.03%)
4,220,129,358 branches # 608.472 M/sec ( +- 0.06% ) (50.04%)
19,025,346 branch-misses # 0.45% of all branches ( +- 0.53% ) (50.04%)
# Table of individual measurements:
6.9402 (-0.0145) #
6.8570 (-0.0978) #
6.9311 (-0.0236) #
7.0101 (+0.0554) #
7.0352 (+0.0805) #
# Final result:
6.9547 +- 0.0315 seconds time elapsed ( +- 0.45% )
```
Roughly 1% cycles win, which is outside the quoted noise level.
Reviewed By: hlu1
Differential Revision: D26994107
fbshipit-source-id: f4c4963be0a5c268cbcdac5359f8278750218ae6