Implementation of torch.isin() (#53125)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/3025
## Background
This PR implements a function similar to numpy's [`isin()`](https://numpy.org/doc/stable/reference/generated/numpy.isin.html#numpy.isin).
The op supports integral and floating point types on CPU and CUDA (+ half & bfloat16 for CUDA). Inputs can be one of:
* (Tensor, Tensor)
* (Tensor, Scalar)
* (Scalar, Tensor)
Internally, one of two algorithms is selected based on the number of elements vs. test elements. The heuristic for deciding which algorithm to use is taken from [numpy's implementation](https://github.com/numpy/numpy/blob/fb215c76967739268de71aa4bda55dd1b062bc2e/numpy/lib/arraysetops.py#L575): if `len(test_elements) < 10 * len(elements) ** 0.145`, then a naive brute-force checking algorithm is used. Otherwise, a stablesort-based algorithm is used.
I've done some preliminary benchmarking to verify this heuristic on a devgpu, and determined for a limited set of tests that a power value of `0.407` instead of `0.145` is a better inflection point. For now, the heuristic has been left to match numpy's, but input is welcome for the best way to select it or whether it should be left the same as numpy's.
Tests are adapted from numpy's [isin and in1d tests](https://github.com/numpy/numpy/blob/7dcd29aaafe1ab8be4be04d3c793e5bcaf17459f/numpy/lib/tests/test_arraysetops.py).
Note: my locally generated docs look terrible for some reason, so I'm not including the screenshot for them until I figure out why.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53125
Test Plan:
```
python test/test_ops.py # Ex: python test/test_ops.py TestOpInfoCPU.test_supported_dtypes_isin_cpu_int32
python test/test_sort_and_select.py # Ex: python test/test_sort_and_select.py TestSortAndSelectCPU.test_isin_cpu_int32
```
Reviewed By: soulitzer
Differential Revision: D29101165
Pulled By: jbschlosser
fbshipit-source-id: 2dcc38d497b1e843f73f332d837081e819454b4e