Improves precision of linspace, logspace (#35461)
Summary:
The Torch algorithms for linspace and logspace conceptually compute each of their values using:
`start_value + step_value * idx`
[And NumPy does the same,](https://github.com/numpy/numpy/blob/cef4dc9d91d980c3df816de00154a22e96439fa4/numpy/core/function_base.py#L24) except NumPy then [sets the last value in its array directly.](https://github.com/numpy/numpy/blob/cef4dc9d91d980c3df816de00154a22e96439fa4/numpy/core/function_base.py#L162) This is because the above computation is unstable when using floats, and NumPy's contract, like PyTorch's, is that the last element in the array is the stop value.
In PyTorch there can be a divergence between the computed last value and the actual value. One user reported case was:
`torch.linspace(-0.031608279794, 0.031531572342, 257, dtype=torch.float32)`
Which causes a difference of 3.7253e-09 between the last value as set by NumPy and computed by PyTorch. After this PR the difference is zero.
Instead of simply setting the last element of the tensor, this PR updates the kernels with a "symmetric" algorithm that sets the first and last array elements without requiring an additional kernel launch on CUDA. The performance impact of this change seems small. I tested with a step sizes of 2^8 and 2^22, and all timing differences were imperceptible except for 2^22 on CPU, which appears to have suffered ~5% slowdown. I think that's an acceptable performance hit for the improved precision when we consider the context of linspace.
An alternative would be to simply set the last element, as NumPy does, on CPU. But I think it's preferable to keep the CPU and CUDA algorithms aligned and keep the algorithm symmetric. In current PyTorch, for example, torch.linspace starts generating values very similar to NumPy, but as the index increases so do the errors, giving our current implementation a "left bias."
Two tests are added to test_torch.py for this behavior. The linspace test will fail on current PyTorch, but the logspace test will succeed since its more complex computation needs wider error bars.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35461
Differential Revision: D20712539
Pulled By: mruberry
fbshipit-source-id: 2c1257c8706f4cdf080ff0331bbf2f7041ab9adf