[PyTorch] Devirtualize TensorImpl::storage() (#51050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51050
Subclasses want to be able to make storage() calls throw, so
we find some free space in TensorImpl to add a flag that they can set
to make that happen without making storage() virtual. It should still
be inlineable.
ghstack-source-id: 121819684
Test Plan:
Compared `perf stat` on 1M iterations on AdIndexer benchmark before/after
Before:
```
74,483.15 msec task-clock # 0.999 CPUs utilized ( +- 0.14% )
16,637 context-switches # 0.223 K/sec ( +- 11.97% )
3 cpu-migrations # 0.000 K/sec ( +- 7.20% )
107,085 page-faults # 0.001 M/sec ( +- 2.39% )
147,356,440,831 cycles # 1.978 GHz ( +- 0.14% ) (50.06%)
278,678,430,378 instructions # 1.89 insn per cycle ( +- 0.01% ) (50.05%)
43,540,698,177 branches # 584.571 M/sec ( +- 0.01% ) (50.05%)
141,028,843 branch-misses # 0.32% of all branches ( +- 1.00% ) (50.05%)
```
After:
```
74,178.77 msec task-clock # 0.999 CPUs utilized ( +- 0.31% )
17,125 context-switches # 0.231 K/sec ( +- 3.41% )
3 cpu-migrations # 0.000 K/sec
109,535 page-faults # 0.001 M/sec ( +- 1.04% )
146,803,364,372 cycles # 1.979 GHz ( +- 0.30% ) (50.03%)
277,726,600,254 instructions # 1.89 insn per cycle ( +- 0.02% ) (50.03%)
43,299,659,815 branches # 583.720 M/sec ( +- 0.03% ) (50.03%)
130,504,094 branch-misses # 0.30% of all branches ( +- 1.14% ) (50.03%)
```
Looks like approximately 0.3% instruction count win (and similarly for cycles, but that's within noise).
Reviewed By: ezyang
Differential Revision: D26013815
fbshipit-source-id: 07939957929070e18b9981d492d8279c9bb33c55