pytorch
99f2000a - Migrate nonzero from TH to ATen (CPU) (#59149)

Commit

3 years ago

Migrate nonzero from TH to ATen (CPU) (#59149) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/58811, Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. | Shape | Before | After (1 thread) | After (8 threads) | |:----------:|--------:|-----------------:|------------------:| | 256,128,32 | 2610 us | 2150 us | 551 us | | 128,128,32 | 1250 us | 1020 us | 197 us | | 64,128,32 | 581 us | 495 us | 99 us | | 32,128,32 | 292 us | 255 us | 83 us | | 16,128,32 | 147 us | 126 us | 75 us | | 8,128,32 | 75 us | 65 us | 65 us | | 4,128,32 | 39 us | 33 us | 33 us | | 2,128,32 | 20 us | 18 us | 18 us | | 1,128,32 | 11 us | 9 us | 9 us | Pull Request resolved: https://github.com/pytorch/pytorch/pull/59149 Reviewed By: mruberry Differential Revision: D28817466 Pulled By: ngimel fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732

Author

peterbell10

Committer

facebook-github-bot

Parents

b4d30bb5

Files19

BUILD.bazel
aten/src
- ATen
  - LegacyTHFunctionsCPU.cpp
  - LegacyTHFunctionsCPU.h
  - ParallelOpenMP.h
  - TensorIterator.cpp
  - TensorIterator.h
  - native
    - ReduceOps.cpp
    - TensorAdvancedIndexing.cpp
    - native_functions.yaml
- TH
  - CMakeLists.txt
  - THTensorEvenMoreMath.cpp
  - THTensorMath.cpp
  - generic
    - THTensorEvenMoreMath.cpp
    - THTensorMath.cpp
    - THTensorMath.h
c10/util
- Unroll.h
test
- test_sparse.py
- test_unary_ufuncs.py
tools
- build_variables.bzl

pytorch 99f2000a - Migrate nonzero from TH to ATen (CPU) (#59149)

pytorch
99f2000a - Migrate nonzero from TH to ATen (CPU) (#59149)