pytorch
95b1bc10 - Migrate nonzero from TH to ATen (CPU) (#58811)

Commit
3 years ago
Migrate nonzero from TH to ATen (CPU) (#58811) Summary: Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. | Shape | Before | After (1 thread) | After (8 threads) | |:----------:|--------:|-----------------:|------------------:| | 256,128,32 | 2610 us | 2220 us | 496 us | | 128,128,32 | 1250 us | 976 us | 175 us | | 64,128,32 | 581 us | 486 us | 88 us | | 32,128,32 | 292 us | 245 us | 80 us | | 16,128,32 | 147 us | 120 us | 71 us | | 8,128,32 | 75 us | 61 us | 61 us | | 4,128,32 | 39 us | 32 us | 32 us | | 2,128,32 | 20 us | 17 us | 17 us | | 1,128,32 | 11 us | 9 us | 9 us | Pull Request resolved: https://github.com/pytorch/pytorch/pull/58811 Reviewed By: anjali411 Differential Revision: D28700259 Pulled By: ngimel fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159
Author
Parents
Loading