pytorch
99f2000a - Migrate nonzero from TH to ATen (CPU) (#59149)

Commit
3 years ago
Migrate nonzero from TH to ATen (CPU) (#59149) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/58811, Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. | Shape | Before | After (1 thread) | After (8 threads) | |:----------:|--------:|-----------------:|------------------:| | 256,128,32 | 2610 us | 2150 us | 551 us | | 128,128,32 | 1250 us | 1020 us | 197 us | | 64,128,32 | 581 us | 495 us | 99 us | | 32,128,32 | 292 us | 255 us | 83 us | | 16,128,32 | 147 us | 126 us | 75 us | | 8,128,32 | 75 us | 65 us | 65 us | | 4,128,32 | 39 us | 33 us | 33 us | | 2,128,32 | 20 us | 18 us | 18 us | | 1,128,32 | 11 us | 9 us | 9 us | Pull Request resolved: https://github.com/pytorch/pytorch/pull/59149 Reviewed By: mruberry Differential Revision: D28817466 Pulled By: ngimel fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732
Author
Parents
  • File
    BUILD.bazel
  • aten/src
    • ATen
      • File
        LegacyTHFunctionsCPU.cpp
      • File
        LegacyTHFunctionsCPU.h
      • File
        ParallelOpenMP.h
      • File
        TensorIterator.cpp
      • File
        TensorIterator.h
      • native
        • File
          ReduceOps.cpp
        • File
          TensorAdvancedIndexing.cpp
        • File
          native_functions.yaml
    • TH
      • File
        CMakeLists.txt
      • File
        THTensorEvenMoreMath.cpp
      • File
        THTensorMath.cpp
      • generic
        • File
          THTensorEvenMoreMath.cpp
        • File
          THTensorMath.cpp
        • File
          THTensorMath.h
  • c10/util
    • File
      Unroll.h
  • test
    • File
      test_sparse.py
    • File
      test_unary_ufuncs.py
  • tools
    • File
      build_variables.bzl