CPU-strided-complex support for compare and pointwise ops (#28735)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)
These changes optimize complex Vec256 math kernels so that are within 2X real number performance on average. [Benchmarks are here](https://docs.google.com/spreadsheets/d/17pObcrSTpV4BOOX9FYf1vIX3QUlEgQhLvL1IBEyJyzs/edit#gid=0)
Changes so far:
- [x] Added complex support for eq, neq, max, and min ops.
- max/min ops need to compare the absolute value for complex numbers (using zabs).
- [x] Added complex support for is_nonzero and where.
- where op compares the absolute value for complex numbers (using zabs).
- [x] Added complex support for linear interp and and pointwise ops.
- [x] Added complex support for check_convert and Linspace/Logspace.
- std::complex does not support ++operator.
- All compilers from clang, g++, c++ on aarch64, x86 produce the same assembly code when using `+=1' instead of `++`. [example for loop](https://godbolt.org/z/O6NW_p)
- [x] Added complex support for log, log2, log10.
- [x] Optimized Vec256 operators using various logarithmic identities.
- `asin()`, `acos()`, `atan()` is optimized using a `ln()` identity.
- `sqrt()` is optimized by splitting the computation into real and imag parts.
- several `_mm256_mul_pd` are avoided by using `_mm256_xor_pd` ops instead.
- [x] Added complex support for pow.
- exp is cast to `std::complex<double>`.
- no special optimization is added when the `exp` is real because the `std::pow()` operator expects a std::complex number.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28735
Differential Revision: D18170691
Pulled By: ezyang
fbshipit-source-id: 6f167398e112cdeab02fcfde8b543cb6629c865a