numpy like nonzero (called nonzero_tuple) (#20293)
Summary:
No performance degradation compared to Numpy when indexing:
```
In [15]: x=torch.randn((1000,1000))
In [16]: %timeit x[x.nonzero_tuple()]
4.63 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [17]: y=x.numpy()
In [18]: %timeit y[y.nonzero()]
14.6 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [20]: x=x.t()
In [22]: %timeit x[x.nonzero_tuple()]
9.01 ms ± 626 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [24]: y=x.numpy()
In [25]: %timeit y[y.nonzero()]
16.8 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20293
Differential Revision: D15358754
Pulled By: umanwizard
fbshipit-source-id: 1344aabd95c969eeda9780c475a39551231879e1