Fix `TensorIterator::view_offsets_` size (#37214)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/37084
There are 3 alternatives for this design.
This PR and the first one.
When a tensor is a scalar `ndim==0`, accessing view_offsets_[0] when doing reductions, yields an invalid offset for the index which is the output of `argmax` and `argmin`.
https://github.com/pytorch/pytorch/blob/fba9b9a023a107c9ad33bfcc834146e362616b19/aten/src/ATen/native/cpu/Reduce.h#L217
This also happens in cuda code:
https://github.com/pytorch/pytorch/blob/fba9b9a023a107c9ad33bfcc834146e362616b19/aten/src/ATen/native/cuda/Reduce.cuh#L797
The second alternative is to check the size of `view_offsets` before accessing it. But this introduces some burden.
The third alternative is related to the way that inputs are treated in `argmax` and `argmin`
depending on the `dim` argument value.
https://github.com/pytorch/pytorch/blob/fba9b9a023a107c9ad33bfcc834146e362616b19/aten/src/ATen/native/ReduceOps.cpp#L775-L780
If `dim` is not specified, then the scalar gets reshaped into a 1-dim tensor and everything works properly, since now `view_offsets` has an actual entry.
If dim is specified, then the input remains as a scalar causing the issue we see here.
This PR tries to solve it in a generic way for every case so I went with option 1. I am willing to discuss it and change if you think that the other alternatives are better.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37214
Differential Revision: D21258320
Pulled By: ngimel
fbshipit-source-id: 46223412187bbba4bfa7337e3f1d2518db72dea2