Improve linspace decomposition and remove its lowering (#91621)
The code produced by the lowering and the decomposition is now the same
modulo a casting to `float32`. This casting is necessary as otherwise
the tests do not pass due to accuracy errors. We prefer accuracy over
speed here, given that this is an associative scan, and thus it's prone
to numerical errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91621
Approved by: https://github.com/ngimel