Optimize view_as_complex and view_as_real (#44908)
Summary:
This avoids unnecessary memory allocations in `view_as_complex` and `view_as_real`. I construct the new tensor directly with the existing storage to avoid creating a new storage object and also use `DimVector`s to avoid allocating for the sizes and strides. Overall, this saves about 2 us of overhead from `torch.fft.fft` which currently has to call `view_as_real` and `view_as_complex` for every call.
I've used this simple benchmark to measure the overhead:
```python
In [1]: import torch
...: a = torch.rand(1, 2)
...: ac = torch.view_as_complex(a)
...: %timeit torch.view_as_real(ac)
...: %timeit torch.view_as_complex(a)
...: %timeit ac.real
```
Results before:
```
2.5 µs ± 62.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2.22 µs ± 36 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.17 µs ± 8.76 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
and after:
```
1.83 µs ± 9.26 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.57 µs ± 7.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
3.47 µs ± 34.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44908
Reviewed By: agolynski
Differential Revision: D23793479
Pulled By: anjali411
fbshipit-source-id: 64b9cad70e3ec10891310cbfa8c0bdaa1d72885b