Optimize CPU version performance of the nonzero function. (#15190)
Summary:
Optimized CPU version of the nonzero. Now 2x faster (in avg.) than numpy.
Can be further optimized for 1D tensors and boolean tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15190
Differential Revision: D13468570
Pulled By: VitalyFedyunin
fbshipit-source-id: e55ce54d60626a42d9a10a02e407856458b8055e