[Pytorch] Specialize guts of c10::optional for 32-bit scalars (#47015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47015
c10::optional has non-trivial copy and move operations always. This change specializes it for 32-bit scalars so that it has trivial copy and move operations in that case. Ideally, we would instead rely on P0602 "variant and optional should propagate copy/move triviality" and use `std::optional` (or implement that functionality ourselves). We can't use `std::optional` because we are stuck with C++14. Implementing the full P0602 ourselves would add even more complexity. We could do it, but this should be a helpful first step.
ghstack-source-id: 115886743
Test Plan:
Collect Callgrind instruction counts for `torch.empty(())`. Data:
Make empty c10-ful (https://github.com/pytorch/pytorch/pull/46092):
```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7ffaed1128e0>
torch.empty(())
All Noisy symbols removed
Instructions: 648005 632899
Baseline: 4144 3736
100 runs per measurement, 1 thread
```
This diff atop #46092:
```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f943f1dc8e0>
torch.empty(())
All Noisy symbols removed
Instructions: 602347 591005
Baseline: 4106 3736
100 runs per measurement, 1 thread
```
(6.6% improvement vs #46092)
Pass optionals by const reference (https://github.com/pytorch/pytorch/pull/46598)
```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f1abb3988e0>
torch.empty(())
All Noisy symbols removed
Instructions: 601349 590005
Baseline: 4162 3736
100 runs per measurement, 1 thread
```
(6.8% improvement vs #46092)
This diff atop #46598 (i.e., both together)
```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f9577c22850>
torch.empty(())
All Noisy symbols removed
Instructions: 596095 582451
Baseline: 4162 3736
100 runs per measurement, 1 thread
Warning: PyTorch was not built with debug symbols.
Source information may be limited. Rebuild with
REL_WITH_DEB_INFO=1 for more detailed results.
```
(another 1.3% savings!)
#46598 outperformed this change slightly, and combining the two leads to further benefits. I guess we should do both! (Though I still don't understand why passing optionals that should fit in a register by const reference would help...)
Reviewed By: smessmer
Differential Revision: D24552280
fbshipit-source-id: 4d93bfcffafebd8c01559398513fa6b9db959d11