implement torch.addr using TensorIterator based kernels (#47664)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47313
This PR implements `torch.addr` function using `TensorIterator` with `cpu_kernel_vec` and `gpu_kernel`.
It helps reduce memory usage, improve performance, and fix the bug when `beta` or `alpha` is a complex number.
Todo
- [x] benchmarking `torch.addr` for the change of this PR, as well as the legacy TH implementation used in PyTorch 1.6.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47664
Reviewed By: zhangguanheng66
Differential Revision: D25059693
Pulled By: ngimel
fbshipit-source-id: 20a90824aa4cb2240e81a9f17a9e2f16ae6e3437