pytorch
9080f1c5 - Rewrite argmax and argmin as TensorIterator reductions (#26181)

Commit
6 years ago
Rewrite argmax and argmin as TensorIterator reductions (#26181) Summary: Fixes https://github.com/pytorch/pytorch/issues/8817 This rewrites `argmax` and `argmin` to use `TensorIterator` as suggested by ngimel in https://github.com/pytorch/pytorch/issues/8817. To support this, the reduction operation is now passed the index along with the current element. I also had to change a few places where the input and output tensor `dtype`s were assumed to be the same. Unfortunatley, this isn't enough to reimplement the variants of `min` and `max` that return indices. There are several places where multiple tensor outputs are assumed to all have the same `dtype` and so returning `pair<scalar_t, int64_t>` for `ops.project` isn't possible. #### Performance Results **Edit:** These timings are invalid, see below for a better perf comparison Timings reported by [`argmax.py`](https://gist.github.com/SsnL/6898c240d22faa91da16fc41359756a2): ``` cuda : 0.1432 cpu : 26.976 numpy: 2.1350 ``` So, the `TensorIterator` reductions are much faster on the GPU but significantly slower on the CPU. `htop` shows the cpu kernel using 4 cores for the cpu reduction so it's not clear what the issue is there. Should I just revert to the old implementation on CPU or is it worth investigating further? I see that other `TensorIterator` cpu reductions are similarly faster in `numpy` e.g. `max`, `mean` `std`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26181 Differential Revision: D17631979 Pulled By: pbelevich fbshipit-source-id: 58424818ef32cef031d436cb6191e9a6ca478581
Author
Parents
Loading