toTensor cleanup on sparsenn & static runtime ops (#51113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51113
toTensor() on an lvalue IValue returns a reference; no need to copy.
ghstack-source-id: 120317233
Test Plan:
fitsships
Compared `perf stat` results before/after (was on top of a diff stack
so don't take baseline as where master is)
Before:
```
74,178.77 msec task-clock # 0.999 CPUs utilized ( +- 0.31% )
17,125 context-switches # 0.231 K/sec ( +- 3.41% )
3 cpu-migrations # 0.000 K/sec
109,535 page-faults # 0.001 M/sec ( +- 1.04% )
146,803,364,372 cycles # 1.979 GHz ( +- 0.30% ) (50.03%)
277,726,600,254 instructions # 1.89 insn per cycle ( +- 0.02% ) (50.03%)
43,299,659,815 branches # 583.720 M/sec ( +- 0.03% ) (50.03%)
130,504,094 branch-misses # 0.30% of all branches ( +- 1.14% ) (50.03%)
```
After:
```
72,695.01 msec task-clock # 0.999 CPUs utilized ( +- 1.18% )
15,994 context-switches # 0.220 K/sec ( +- 5.21% )
3 cpu-migrations # 0.000 K/sec
107,743 page-faults # 0.001 M/sec ( +- 1.55% )
145,647,684,269 cycles # 2.004 GHz ( +- 0.30% ) (50.05%)
277,341,084,993 instructions # 1.90 insn per cycle ( +- 0.02% ) (50.04%)
43,200,717,263 branches # 594.273 M/sec ( +- 0.02% ) (50.05%)
143,873,086 branch-misses # 0.33% of all branches ( +- 0.59% ) (50.05%)
```
Looks like an 0.7% cycles win (barely outside the noise) and an 0.1%
instructions win.
Reviewed By: hlu1
Differential Revision: D26051766
fbshipit-source-id: 05f8d71d8120d79f7cd80aca747dfc537bf7d382