CPU-Strided-Complex Support for reduce ops and linpack ops (#27653)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)
Changes so far:
- [x] Renamed references to variable "I" that may be confused for "I" defined in complex.h. I did this to avoid crazy CI failures messages as complex.h is included by more source files.
- aten/src/ATen/native/cpu/Loops.h (Renamed I to INDEX)
- aten/src/ATen/native/cuda/Loops.cuh (Renamed I to INDEX)
- aten/src/ATen/core/ivalue_inl.h (Renamed I to INDEX)
- c10/util/Array.h (Renamed I to INDEX)
- c10/util/C++17.h (Renamed I to INDEX)
- c10/util/Metaprogramming.h (Renamed I to INDEX)
- c10/util/SmallVector.h (custom renaming)
- [x] Added complex support of Linear Algebra Ops.
- SVD needed to be modified to support mixed data types
- Example U(std::complex<double)), S(double), V(std::complex<double>)
- See before and after benchmark below (No observable change in performance).
- [x] Added complex support of Reduce Ops.
- var/std computations could have been faster if it was possible to interpret std::complex<double> Tensor as a double Tensor.
- [x] Added complex derivative support for autograd functionality.
- derivatives are the same as defined by numpy autograd library for real(), imag(), conj(), angle(). These functions only affect complex numbers.
- derivative of abs() has not been modified to not interfere with existing code.
- Autograd defines abs() for complex numbers and fabs() for real numbers. I will look into this further down the road.
----------------------------------------
PyTorch/Caffe2 Operator Micro-benchmarks Before Changes
----------------------------------------
Tag : short
Benchmarking PyTorch: svd
Mode: Eager
Name: svd_M512_N512
Input: M: 512, N: 512
Forward Execution Time (us) : 162339.425
Forward Execution Time (us) : 162517.479
Forward Execution Time (us) : 162847.775
----------------------------------------
PyTorch/Caffe2 Operator Micro-benchmarks After Changes
----------------------------------------
Tag : short
Benchmarking PyTorch: svd
Mode: Eager
Name: svd_M512_N512
Input: M: 512, N: 512
Forward Execution Time (us) : 162032.117
Forward Execution Time (us) : 161943.484
Forward Execution Time (us) : 162513.786
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27653
Differential Revision: D17907886
Pulled By: ezyang
fbshipit-source-id: a88b6d0427591ec1fba09e97c880f535c5d0e513