Use scalar implementation to keep the precision in linspace of integral types (#89048)
Fixes #88652
In the CPU implementation of linspace of integral types, `base` type in vectorized implementation is `int64_t`, which will drop the precision when `base` comes from a floating number. Meanwhile, its vectorized implementation tends to suffer from the catastrophic cancellation of floating point arithemtic since both the `base (start + step * idx)` and the `step` are not exact. Its scalar implementation is fine since start is always an integer and the result would be truncated to integer as well.
Therefore, in this PR , we will skip the vectorized implementation since the vec doesn't contribute to performance anyway. And now the behaviors between CPU and GPU are the same. In some cases, the results are the same as numpy's. In some other cases, the results are different from numpy's, but it is not related to the devices (CPU and GPU). https://github.com/pytorch/pytorch/issues/81996#issuecomment-1192980485
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89048
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/albanD